Incremental and Approximate Inference for Faster Occlusion-based - - PowerPoint PPT Presentation

incremental and approximate inference for faster
SMART_READER_LITE
LIVE PREVIEW

Incremental and Approximate Inference for Faster Occlusion-based - - PowerPoint PPT Presentation

Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations Supun Nakandala , Arun Kumar, and Yannis Papakonstantinou University of California, San Diego Introduction Deep Convolutional Neural Networks (CNNs) are


slide-1
SLIDE 1

Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations

Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou University of California, San Diego

slide-2
SLIDE 2

Introduction

2

Deep Convolutional Neural Networks (CNNs) are revolutionizing many image analytics tasks Surveillance Autonomous Vehicles

slide-3
SLIDE 3

Background: What is a CNN?

3

*Simplified representation of a CNN Convolution Layer Input 3-D Array Output 3-D Array

P(pneumonia)

Input Image 3-D array Series of Convolution Layer Transformations Predict Class Probability …

slide-4
SLIDE 4

4

Explainability of CNN predictions is important in many critical applications such as in healthcare!

slide-5
SLIDE 5

5

How to Explain CNN Predictions?

An active research area Occlusion-based explanation (OBE) is widely used by practitioners

slide-6
SLIDE 6

6

Occlusion-based Explanations (OBE)

Original Image Occluded Image P(pneumonia)

Source: http://blog.qure.ai/notes/visualizing_deep_learning

Oc

Occlusion heatmap localizes the region

  • f interest
slide-7
SLIDE 7

7

Problem: OBE is Highly Time Consuming

CNN Inference is time consuming.

E.g. Inception3: 35 MFLOPS, ResNet152 : 65 MFLOPS

*MFLOPS: Mega Floating Point Operations

Cast OBE as a query optimization task Database-inspired optimization techniques Our Idea

Can take between several seconds to several minutes! Possibly thousands

slide-8
SLIDE 8

Outline

8

  • 2. Incremental CNN Inference
  • 3. Approximate CNN Inference
  • 4. Experimental Results
  • 1. Background on CNN Internals

Inspired by MQO + IVM

*IVM: Incremental View Maintenance *MQO: Multi-Query Optimization

Inspired by AQP + Vision Science

*AQP: Approximate Query Processing

slide-9
SLIDE 9

Background: What is a CNN?

9

P(pneumonia)

*Simplified representation of a CNN Input Image 3-D array Series of Convolution Layer Transformations Predict Class Probability Convolution Layer Input 3-D Array Output 3-D Array …

slide-10
SLIDE 10

Background: Convolution Layer

10

Input 3-D Array 3-D Filter Kernel (learned) X K1

x

SUM( K1 o X’)

  • : Hadamard product

x x x x x x x x

2-D slice of Output K2

x x x x x x x x x

Kn

x x x x x x x x x

Convolution Layer

slide-11
SLIDE 11

Reimagining Convolution as a Query

11

X Y Z V X Y Z N V

Input: A Filter Kernels: K

WHERE A_K.X = K.X AND A_K.Y = K.Y FROM K, SELECT A.X AS X, A.Y AS Y, K.N AS Z, SUM(A.V * K.V) AS V GROUP BY A.X, A.Y, K.N (SELECT A.X , A.Y, A.Z, A.V, A.X - T.X + FW /2 AS A_K.X, A.Y - T.Y + FH /2 AS A_K.Y FROM A, A AS T WHERE ABS(A.X - T.X) <= FW /2 AND ABS(A.Y - T.Y) <= FH/2)

*FW: Filter Width, FH: Filter Height

CNN performs series of Joins and Aggregates Linear Algebra data model improves hardware utilization

Takeaway:

slide-12
SLIDE 12

Outline

12

  • 2. Incremental CNN Inference
  • 3. Approximate CNN Inference
  • 4. Experimental Results
  • 1. Background on CNN Internals

Inspired by MQO + IVM

*IVM: Incremental View Maintenance *MQO: Multi-Query Optimization

Inspired by AQP + Vision Science

*AQP: Approximate Query Processing

slide-13
SLIDE 13

Observation: Redundant Computations

13

Same Different

  • Multiple such occluded images

Geometric properties of CNN determines how to propagate changes

Layer 1 * Only a cross section is shown. Changed region spans the depth dimension Layer 2 Layer N

This is a new instance of the Incremental View Maintenance task in databases

slide-14
SLIDE 14

Our Solution: Incremental Inference

14

Cast OBE as a set of sequence of “queries”

  • P(pneumonia)

Original Image Materialized Views

  • P(pneumonia)

Algebraic Framework for Incremental Propagation: No redundant computations IVM

Multiple occluded images. Sequential execution throttles the performance, especially on GPUs!

slide-15
SLIDE 15

Our Solution: Batched Incremental Inference

15

Share and reuse materialized views across all occluded images

  • Read

additional context …

Multiple IVM queries run in one go (form of MQO). We create a custom GPU kernel for parallel memory copies. Improves hardware utilization.

slide-16
SLIDE 16

16

Theoretical speedups for popular deep CNNs with our IVM Issue: “Avalanche Effect” causes low speedups in some CNNs

What speedups can we expect?

slide-17
SLIDE 17

Outline

17

  • 2. Incremental CNN Inference
  • 3. Approximate CNN Inference
  • 4. Experimental Results
  • 1. Background on CNN Internals

Inspired by AQP + Vision Science

slide-18
SLIDE 18

Approximate CNN Inference

18

Trade off visual quality of heatmap to reduce runtime How do we quantify new heatmap quality? SSIM values close to 0.9 are widely used Basic Idea:

SSIM

Exact heatmap Approximate heatmap 1.0 : identical

  • 1.0

Structural similarity index (SSIM)

slide-19
SLIDE 19

Overview of our Approximations

19

  • a. Projective Field Thresholding
  • b. Adaptive Drill-down

Combats Avalanche Effect by pruning computations Makes every query in OBE faster Lower granularity queries for less sensitive regions Reduces total number of queries in OBE

slide-20
SLIDE 20

Avalanche Effect

20

Example: 1-D Convolution Filter Kernel Gains from Incremental Inference diminishes at latter layers! Projective Field

slide-21
SLIDE 21

Our Solution: Projective Field Thresholding

21

Projective Field Threshold τ = 5/9

1 3 6 6 7 3 1

Number of different paths:

slide-22
SLIDE 22

How do we pick τ ?

22

Runtime Visual Heatmap Quality Depends on Image and CNN Properties Auto tune τ for SSIM target using a sample image set Done once upfront during system configuration

τ=1.0 τ=0.8 τ=0.6 τ=0.4

slide-23
SLIDE 23

Outline

23

  • 2. Incremental CNN Inference
  • 3. Approximate CNN Inference
  • 4. Experimental Results
  • 1. Background on CNN Internals
  • a. Projective Field Thresholding
  • b. Adaptive Drill-down
slide-24
SLIDE 24

24

Images Chest X-Ray Images Task Predicting Pneumonia CNNs VGG16, ResNet18, Inception3 Occluding Patch Color Black Occluding Patch Size 16 x 16 Occluding Patch Stride 4 SSIM Target 0.90

Workload

More datasets in the paper

slide-25
SLIDE 25

25

CPU Intel i7 @ 3.4 GHz GPU One Nvidia Titan Xp Memory 32 GB Deep Learning Toolkit PyTorch version 0.4.0

Experimental Setup

slide-26
SLIDE 26

26

GPU

Runtime (s) 3 6 9 12 VGG16 ResNet18 Inception3

Naive Incremental (Exact) Incremental + Approximate

CPU

Runtime (min) 4.5 9 13.5 18 VGG16 ResNet18 Inception3

5.4x 2.1x 1.5x 13.8x 4.9x 3.7x 3.9x 1.6x 0.7x 8.6x 3.1x 2.3x

slide-27
SLIDE 27

Summary

27

Explaining CNN predictions is important. OBE is widely used. DB-inspired incremental and approximate inference optimizations to accelerate OBE. Our optimizations make OBE more amenable to interactive diagnosis of CNN predictions.

Project Web Page: https://adalabucsd.github.io/krypton.html snakanda@eng.ucsd.edu Video: https://tinyurl.com/y2oy9hqq

slide-28
SLIDE 28

System Architecture

28

FFI

PyTorch

Python

GPU Memory

cuDNN Library

Custom Kernel InterfaceC Custom Kernel Impl.

Cuda

Invokes Flow of Data

1 2 3 4 5 6

0: Invoke incremental inference. 1: Initialize the input tensors, kernel weights and output buffer in the GPU memory. 2: Invoke the Custom Kernel Interface (written in C) using Python foreign function interface (FFI) support. Pass memory references of input tensors, kernel weights and output buffer. 3: Forward the call to the Custom Kernel Implementation (written in CUDA). 4: Parallely copy the memory regions from the input tensor to an intermediate memory buffer. 5: Invoke the CNN transformation using cuDNN. 6: cuDNN reads the input from intermediate buffer and writes the transformed output to the output buffer. 7: Read the output to the main memory or pass reference as the input to the next transformation.

7

Krypton

Python

slide-29
SLIDE 29

Read Context

29

Input Output Updated patch in the output Updated patch in the input Input patch that needs to be read in to the transformation operator Padding Filter kernel

slide-30
SLIDE 30

30

Incremental and Approximate Inference for Faster Occlusion-Based Deep CNN Explanations

Observation Explainability of CNN predictions is

  • important. Occlusion-based

explainability (OBE) is widely used. Problem OBE is highly compute intensive. This Work Cast OBE as an instance of view-materialization problem. Perform incremental and approximate inference. ~5x and ~35x speedups for exact and approximate heatmaps.

Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou University of California, San Diego