Outlier Channel Splitting Improving DNN Quantization without - PowerPoint PPT Presentation

Outlier Channel Splitting Improving DNN Quantization without Retraining Ritchie Zhao , Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang School of Electrical and Computer Engineering Cornell University

Specialized DNN Processors are Ubiquitous Mobile Cloud Embedded Apple (A12) Google (TPU) Google (Edge TPU) Samsung (Exynos 9820) Microsoft (Brainwave) Intel (Movidius) Huawei (Kirin 970) Xilinx (EC2 F1) Deephi/Xilinx (Zynq) Qualcomm (Hexagon) Intel (FPGAs, Nervana) ARM (announced) AWS Offerings Many Startups 2

Quantization is Key to Hardware Acceleration Lower Precision → less energy and area per op Lower Precision → fewer bits of storage per data FPGA Performance GPU Performance ResNet-50 float, 3-bit float, 2-bit mantissa mantissa https://developer.nvidia.com/tensorrt E. Chung, J. Fowers et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , IEEE Micro , April 2018. 3

Data-Free Quantization ▸ DNN quantization techniques that require training are discouraged by the current ML service model ML Customer ML Service Provider Model Training Model Serving Data Floating Optimization Point Model ▸ Reasons to prefer data-free quantization : ML providers typically cannot access customer training data 1. Customer is using a pre-trained off-the-shelf model 2. Customer is unwilling to retrain a legacy model 3. Customer lacks the expertise for quantization training 4. 4

Paper Summary Baseline Prior Art Our Method Linear Quantizer Clipping Outlier Channel Splitting Distorted Log Frequency Outliers Outliers − Poor quantizer resolution + Reduces quantization noise + Reduces quantization noise due to outliers + Used in NVIDIA TensorRT + Removes outliers − Distorts outliers − Model size overhead ▸ OCS improves quantization without retraining ▸ OCS can outperform existing methods with negligible size overhead (<2%) in both CNNs and RNNs ▸ We also perform a comprehensive evaluation of different clipping methods in literature 5

Outlier Channel Splitting (b) (a) z z z v 2 v 2 v 2 v 2 v 1 v 2 v 1 v 1 2 2 y 2 y 2 y 2 y 1 y 2 y 1 y 2 y 1 or 2 2 x 1 x 2 x 1 x 2 x 1 x 2 𝑨 = 𝑤 1 𝑧 1 + 𝑤 2 2 𝑧 2 + 𝑤 2 𝑧 2 𝑧 2 𝑨 = 𝑤 1 𝑧 1 + 𝑤 2 𝑧 2 2 𝑧 2 𝑨 = 𝑤 1 𝑧 1 + 𝑤 2 2 + 𝑤 2 2 ▸ OCS splits weights or activations, halving them – (a) Duplicate node y 2 to halve the weight v 2 – (b) Duplicate weight v 2 to halve the activation y 2 – Inspired by Net2Net , a paper on layer transformations T. Chen, I. Goodfellow, J. Shlens, Net2Net: Accelerating Learning via Knowledge Transfer . ICLR’16 , May 2016. 6

Quantization-Aware Splitting 𝑥 Naïve Splitting ( Net2Net ) 2 𝑥 split 𝑥 → (𝑥 2 , 𝑥 2) quantize Halves round in the same direction Δ 2 Δ 3 Δ Quantization-Aware Splitting 𝑥 2 − ∆ 𝑥 2 + ∆ 𝑥 → ( 𝑥 2 − ∆ 4 , 𝑥 2 + ∆ 𝑥 4 4 4 ) split Halves can round in opposite directions quantize to help cancel out quantization noise Δ 2 Δ 3 Δ ▸ In the paper, we show that QA splitting preserves the expected quantization noise on a single value 7

Results on CNNs Network Wt. Quantized Acc. ( ± vs. Best Clipping Result) (Float Acc.) Bits OCS OCS + Clip 6 +1.0 +0.5 VGG-16 BN 5 +3.3 +2.6 (73.4) −33.1 4 +4.4 In these results OCS is 6 +0.4 +0.5 constrained to ~2% size ResNet-50 5 +2.0 +2.0 overhead. (76.1) −26.8 4 +4.2 6 +1.6 +1.7 DenseNet-121 Blue = +1% or better 5 +4.3 +5.3 Red = −1% or worse (74.4) −5.1 4 +13.9 6 +5.6 +5.5 Inception-V3 5 +13.5 +19.5 (75.9) −1.4 4 +0.7 ▸ OCS constrained to 2% overhead outperforms Clipping at 6-5 bits ▸ OCS + Clipping outperforms Clipping alone at 4 bits 8

Thank you! Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Zhiru Zhang. Improving Neural Network Quantization without Retraining using Outlier Channel Splitting. ICML , June 2019 Code available at: https://github.com/cornell-zhang/dnn-quant-ocs 9

Outlier Channel Splitting Improving DNN Quantization without - PowerPoint PPT Presentation

Outlier Channel Splitting Improving DNN Quantization without Retraining Ritchie Zhao , Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang School of Electrical and Computer Engineering Cornell University Specialized DNN Processors are

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

CHANNEL ALLOCATION Channel Language Translation Channel Translation Language Channel 1 German

ANNUAL ACCOUNTS PRESS CONFERENCE CHANNEL ALLOCATION. Channel Language Translation Channel

Outlier Detection Outlier detection is both easy and difficult. It is easy since there are

Introduction 1 Splitting unpack 2 Splitting pack 3 Reduction 4 Advanced technicalities 5

Channel Assignment and Channel Hopping in IEEE 802.11 Operating Channels for 802.11b Europe

Proximity-based Outlier Detection Objects far away from the others are outliers The

ANNUAL ACCOUNTS PRESS CONFERENCE LANGUAGE CHANNELS. Channel Language Channel (translation)

Channel design Channel coverage Intensive Selective Exclusive Channel

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

1 Simultaneous interpretation EN channel 1 FR channel 2 ES channel 3 DE channel 4 2 The Future

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models Chan-Su Lee and Ahmed

DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles Yue Zhao Maciej K.

Outlier Detection Chapter 12 of Data Mining: Concepts and Techniques JIAWEI HAN, MICHELINE KAMBER,

3. EXTERNAL CHANNEL k e e r C y a 0 9 w y e s w t a H i G S l U E l

Overview Basic WiFi concepts Some deployment issues WiFi versions 15-441/641: WiFi

A Quantum Multiparty Packing Lemma and the Relay Channel Dawei Ding Stanford University Joint

Unbounded number of channel uses are required to see quantum capacity T. Cubitt, D. Elkouss, W.

A unified framework for complementarity in quantum information Jason Crann with D. Kribs, R.

Channels & Keyframes CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter

802.11n Network Management --- Lara Deek --- Eduard Garcia-Villegas Elizabeth Belding

De Finetti Theorems for Quantum Channels arXiv:1810.12197 Mario Berta with Borderi, Fawzi,

Sambuz

Useful Links

Newsletter

Mail Us

Outlier Channel Splitting Improving DNN Quantization without - PowerPoint PPT Presentation

Outlier Channel Splitting Improving DNN Quantization without Retraining Ritchie Zhao , Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang School of Electrical and Computer Engineering Cornell University Specialized DNN Processors are

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

CHANNEL ALLOCATION Channel Language Translation Channel Translation Language Channel 1 German

ANNUAL ACCOUNTS PRESS CONFERENCE CHANNEL ALLOCATION. Channel Language Translation Channel

Outlier Detection Outlier detection is both easy and difficult. It is easy since there are

Introduction 1 Splitting unpack 2 Splitting pack 3 Reduction 4 Advanced technicalities 5

Channel Assignment and Channel Hopping in IEEE 802.11 Operating Channels for 802.11b Europe

Proximity-based Outlier Detection Objects far away from the others are outliers The

ANNUAL ACCOUNTS PRESS CONFERENCE LANGUAGE CHANNELS. Channel Language Channel (translation)

Channel design Channel coverage Intensive Selective Exclusive Channel

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

1 Simultaneous interpretation EN channel 1 FR channel 2 ES channel 3 DE channel 4 2 The Future

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models Chan-Su Lee and Ahmed

DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles Yue Zhao Maciej K.

Outlier Detection Chapter 12 of Data Mining: Concepts and Techniques JIAWEI HAN, MICHELINE KAMBER,

3. EXTERNAL CHANNEL k e e r C y a 0 9 w y e s w t a H i G S l U E l

Overview Basic WiFi concepts Some deployment issues WiFi versions 15-441/641: WiFi

A Quantum Multiparty Packing Lemma and the Relay Channel Dawei Ding Stanford University Joint

Unbounded number of channel uses are required to see quantum capacity T. Cubitt, D. Elkouss, W.

A unified framework for complementarity in quantum information Jason Crann with D. Kribs, R.

Channels &amp; Keyframes CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter

802.11n Network Management --- Lara Deek --- Eduard Garcia-Villegas Elizabeth Belding

De Finetti Theorems for Quantum Channels arXiv:1810.12197 Mario Berta with Borderi, Fawzi,

Sambuz

Useful Links

Newsletter

Mail Us

Channels & Keyframes CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter