Towards Accurate Post-training Network Quantization via Bit-split and Stitching
Peisong Wang, Qiang Chen, Xiangyu He, Jian Cheng
Institute of Automation, Chinese Academy of Sciences
1
Towards Accurate Post-training Network Quantization via Bit-split - - PowerPoint PPT Presentation
Towards Accurate Post-training Network Quantization via Bit-split and Stitching Peisong Wang , Qiang Chen, Xiangyu He, Jian Cheng Institute of Automation, Chinese Academy of Sciences 1 Outline Background Motivation Approach
Institute of Automation, Chinese Academy of Sciences
1
2
3
4
Pre-trained Model Network Quantization Finetune using data/labels
Pre-trained Model Network Quantization
Data-free BP-free Easy to use This work
Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018).
5
Pretrained model
6
Pretrained model Low-bit model
7
Minimize the Di Distance Pretrained model Low-bit model
8
Minimize the Di Distance
I. Define the distance II. Minimize the distance
Pretrained model Low-bit model
9
I. Define the distance II. Minimize the distance
127
|Max|
Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018).
TF-lite Map the maximum weighs (activations) to the maximum low-bit number
10
I. Define the distance II. Minimize the distance
127
TensorRT Map the clip value to the maximum low-bit number
Szymon Migacz. 8-bit Inference with TensorRT. GTC 2017
11
I. Define the distance II. Minimize the distance
Minimize the Di Distance Pretrained model Low-bit model This work Learns a low-bit mapping from input to the
Objective Previous work
12
I. Define the distance II. Minimize the distance (Bit-split)
𝑟" 𝑟# 𝑟$ … 𝑟$%"
2' 2" 2$%#
13
Optimize 𝛽 Optimize m-th bit
Wang, P., Hu, Q., Zhang, Y., Zhang, C., Liu, Y. and Cheng, J., 2018. Two-step quantization for low-bit neural networks. In Proceedings of the IEEE Conference on computer vision and pattern recognition (pp. 4376-4384).
14
Problem: Optimization:
15
Weight Quantization: Both Weight and Activation Quantization:
16
17
18
Codes are available at https://github.com/wps712/BitSplit peisong.wang@nlpr.ia.ac.cn