Towards Accurate Post-training Network Quantization via Bit-split - - PowerPoint PPT Presentation

towards accurate post training network quantization via
SMART_READER_LITE
LIVE PREVIEW

Towards Accurate Post-training Network Quantization via Bit-split - - PowerPoint PPT Presentation

Towards Accurate Post-training Network Quantization via Bit-split and Stitching Peisong Wang , Qiang Chen, Xiangyu He, Jian Cheng Institute of Automation, Chinese Academy of Sciences 1 Outline Background Motivation Approach


slide-1
SLIDE 1

Towards Accurate Post-training Network Quantization via Bit-split and Stitching

Peisong Wang, Qiang Chen, Xiangyu He, Jian Cheng

Institute of Automation, Chinese Academy of Sciences

1

slide-2
SLIDE 2
  • Background
  • Motivation
  • Approach
  • Experiments

2

Outline

slide-3
SLIDE 3

3

Background

  • Low-bit quantization has emerged as a promising compression technique
  • Robustness to network architectures
  • Hardware friendly
  • Problems: low-bit quantization relies on
  • Training data
  • Large computational resources (CPUs, GPUs)
  • Quantization skills and expertise
slide-4
SLIDE 4

4

Background

Training-aware quantization

Pre-trained Model Network Quantization Finetune using data/labels

Post-training Quantization

Pre-trained Model Network Quantization

Data-free BP-free Easy to use This work

Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018).

slide-5
SLIDE 5

5

Motivation

Post-training quantization

Pretrained model

slide-6
SLIDE 6

6

Motivation

Post-training quantization

Pretrained model Low-bit model

slide-7
SLIDE 7

7

Motivation

Post-training quantization

Minimize the Di Distance Pretrained model Low-bit model

slide-8
SLIDE 8

8

Motivation

Post-training quantization

Minimize the Di Distance

I. Define the distance II. Minimize the distance

Pretrained model Low-bit model

slide-9
SLIDE 9

9

Related works

I. Define the distance II. Minimize the distance

127

  • 127

|Max|

  • |Max|

Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018).

TF-lite Map the maximum weighs (activations) to the maximum low-bit number

slide-10
SLIDE 10

10

Related works

I. Define the distance II. Minimize the distance

127

  • 127

TensorRT Map the clip value to the maximum low-bit number

  • utliers
  • utliers

Szymon Migacz. 8-bit Inference with TensorRT. GTC 2017

slide-11
SLIDE 11

11

Method

I. Define the distance II. Minimize the distance

Minimize the Di Distance Pretrained model Low-bit model This work Learns a low-bit mapping from input to the

  • utput of every convolution.

Objective Previous work

slide-12
SLIDE 12

12

Method

I. Define the distance II. Minimize the distance (Bit-split)

𝑟" 𝑟# 𝑟$ … 𝑟$%"

2' 2" 2$%#

slide-13
SLIDE 13

13

Method

Optimize 𝛽 Optimize m-th bit

Wang, P., Hu, Q., Zhang, Y., Zhang, C., Liu, Y. and Cheng, J., 2018. Two-step quantization for low-bit neural networks. In Proceedings of the IEEE Conference on computer vision and pattern recognition (pp. 4376-4384).

slide-14
SLIDE 14

14

Bit-Split for Post-training Network Quantization

Problem: Optimization:

slide-15
SLIDE 15

15

Bit-Split Results

Weight Quantization: Both Weight and Activation Quantization:

slide-16
SLIDE 16

16

Comparison with State-of-the-arts

slide-17
SLIDE 17

17

Results on Detection and Instance segmentation

slide-18
SLIDE 18

Thanks for your attention.

18

Codes are available at https://github.com/wps712/BitSplit peisong.wang@nlpr.ia.ac.cn