Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 - PowerPoint PPT Presentation

Jul 16, 2023 •348 likes •421 views

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is Quantization? source: Han et al Converting weight value to low-bit integer like 8bit precision from float-point without significant accuracy drop.

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018
Quantization for TVM What is Quantization? source: Han et al Converting weight value to low-bit integer like 8bit precision from float-point without significant accuracy drop.
Quantization for TVM Train Frontend DL Framework Convert Relay: High-Level Graph IR Apply Gain Compression & Acceleration: Quantization - Less storage space - Faster arithmetic operation Deploy - Friendly to accelerator and ultra low-power embedded devices
Quantization for TVM Choice Spaces for Quantization - number of bit - 4bit, 8bit, 16bit - quantization scheme: - symmetric, asymmetric, etc. - hardware constraint: - e.g. prefer integer shift instead of float multiplication Goal Instead of proposing “the only right way to achieve quantization in TVM”, we would like to build a quantization workflow which can be customized flexibly .
Quantization for TVM f32 f32 f32 f32 f32 Batch Original Conv2D ReLU Conv2D Norm f32 f32 f32 f32 f32 f32 Simulated Mul, Add Simulated After Annotate Conv2D Conv2D Quantize ReLu Quantize SimQ simulates the rounding f32 f32 error and saturating error during f32 f32 quantizing. Its argument will get Simulated Simulated W1 W2 tuned during calibrate . Quantize Quantize Clip ( Round ( x r * 2 nbit − sign )) * r SimQ ( nbit , range , sign ) = 2 nbit − sign i32 i32 i8 i32 f32 i8 Mul Clip Mul, Add Shift Clip After Realize Conv2D Conv2D Cast ReLu Cast i8 i8 f32 f32 Mul Clip Mul Clip W1 W2 Cast Cast
Quantization for TVM Code Sample # user can override the annotate function @register_annotate_function("nn.conv2d", override=True) def annotate_conv2d(ref_call, new_args, ctx): lhs, rhs = new_args lhs = attach_simulated_quantize(lhs, sign=False, rounding='round') rhs = attach_simulated_quantize(lhs, sign=False, rounding='stochastic_round') return expr.Call(ref_call.op, [lhs, rhs], ref_call.attrs) # assuming we have an existed mxnet model, convert it to relay graph graph, params = relay.frontend.from_mxnet(mxnet_model) # quantize the relay graph with all kinds of configure with qconfig(nbit_dict={QFieldKind.ACTIVATION: 24}, global_scale=8.0, skip_k_conv=1): qgraph, qparams = quantize(graph, params) # ...build and deploy it locally or remotely with tvm
Quantization for TVM Demonstration with 8bit Symmetric Quantization Global Scale Accuracy Time/ms Cortex A53 VTA 2.0 64.1% ResNet18 4.0 68.1% 307.09 64.87 8.0 69.5% MobileNet 131.14 51.96 16.0 69.6% Accuracy Drop with ResNet18 (original 70.8%) End to End Performance

Recommend

TVM at Facebook Lots of contributors at FB and elsewhere TVM at Facebook Why TVM? Examples from

TVM at Facebook Lots of contributors at FB and elsewhere TVM at Facebook Why TVM? Examples from Speech Synthesis Sparsity PyTorch Why TVM for ML Systems? - Performance matters - Flexibility matters - Portability matters ML Systems at

679 views • 29 slides

VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack High-Level Differentiable IR Tensor Expression IR LLVM CUDA Metal TVM Stack High-Level Differentiable IR Tensor Expression IR LLVM CUDA Metal

1.01k views • 84 slides

Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois

Quantization, after Souriau Souriau Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois Ziegler (Georgia Southern) Quantum Nilpotent Reductive Geometric Quantization: Old and New E(3) 2019 CMS

1.51k views • 123 slides

December 12, 2018 Luis Ceze Welcome to the 1st TVM and Deep Learning Compilation Conference!

1st TVM and Deep Learning Compilation Conference December 12, 2018 Luis Ceze Welcome to the 1st TVM and Deep Learning Compilation Conference! Welcome to the 1st TVM and Deep Learning Compilation Conference! 180+ ppl! Machine learning is

1.31k views • 115 slides

TVM TVM f for ed or edge c e com omputin ting p g pla latf tform orm NTT Software Inno

TVM TVM f for ed or edge c e com omputin ting p g pla latf tform orm NTT Software Inno nnovation n Ce Center Ka Kazutaka Mo Morita In Inference in 5G era Edge Devices Offload MEC (Mobile edge computing) server Offload

181 views • 7 slides

TVM Deep Learning on Bare-Metal Devices Pratyush Patel No OS stack Extend TVM to support

TVM Deep Learning on Bare-Metal Devices Pratyush Patel No OS stack Extend TVM to support bare-metal devices Optimization High-Level Differentiable IR AutoTVM Tensor Expression IR LLVM, CUDA VTA AutoVTA Hardware FPGA ASIC Fleet

1.46k views • 18 slides

TVM @ FB Andrew Tulloch Research Scientist Background Excited to be here! Lots of FB

TVM @ FB Andrew Tulloch Research Scientist Background Excited to be here! Lots of FB folks in the audience Working in TVM since ~June Focusing on apply TVM to accelerate ML inference on CPUs/GPUs across mobile and server

863 views • 24 slides

LOW PRECISION INFERENCE ON GPU Hao Wu, NVIDIA OUTLINE Performance motivation for quantization

LOW PRECISION INFERENCE ON GPU Hao Wu, NVIDIA OUTLINE Performance motivation for quantization Quantization details Post-training quantization accuracy Training for quantization 2 INFERENCE (sometimes called serving)

1.13k views • 60 slides

Same, Same But Different Recovering Neural Network Quantization Error Through Weight

Same, Same But Different Recovering Neural Network Quantization Error Through Weight Factorization Eldad Meller ICML 2019 Neural Network Quantization Quantization of Neural Networks is needed for efficient inference Quantization adds

482 views • 9 slides

TVM & THE APACHE SOFTWARE FOUNDATION MARKUS WEIMER MEMBER, APACHE SOFTWARE FOUNDATION

TVM & THE APACHE SOFTWARE FOUNDATION MARKUS WEIMER MEMBER, APACHE SOFTWARE FOUNDATION ARCHITECT, MICROSOFT ML PLATFORM TVM & THE APACHE SOFTWARE FOUNDATION Why I am here MARKUS WEIMER MEMBER, APACHE SOFTWARE FOUNDATION

861 views • 43 slides

Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of

Hamiltonian actions Quantization Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of W urzburg August 22, 2014 1 / 15 Hamiltonian actions Quantization Outline Hamiltonian actions Hamiltonian

759 views • 21 slides

Adiabatic limits, Theta functions, and Geometric Quantization 2019 CMS Winter Meeting Takahiko

Adiabatic limits, Theta functions, and Geometric Quantization 2019 CMS Winter Meeting Takahiko Yoshida Meiji University Based on arXiv:1904.04076 1 Purpose & Main Theorems Geometric quantization Geometric quantization

472 views • 15 slides

CMSC5743 L05: Quantization Bei Yu (Latest update: October 12, 2020) Fall 2020 1 / 25 Overview

CMSC5743 L05: Quantization Bei Yu (Latest update: October 12, 2020) Fall 2020 1 / 25 Overview Fixed-Point Representation Non-differentiable Quantization Differentiable Quantization Reading List 2 / 25 Overview Fixed-Point Representation

537 views • 29 slides

From Martingales in Finance to Quantization for pricing Giorgia Callegaro Universit di Padova

Introduction Quantization Recursive marginal quantization Results and perspectives From Martingales in Finance to Quantization for pricing Giorgia Callegaro Universit di Padova Workshop on Martingales in Finance and Physics ICTP, 24 May

803 views • 24 slides

Quantization of group-valued moment maps III Eckhard Meinrenken June 4, 2011 Eckhard Meinrenken

Quantization of group-valued moment maps III Eckhard Meinrenken June 4, 2011 Eckhard Meinrenken Quantization of group-valued moment maps III Pre-quantization of q-Hamiltonian spaces Recall again the axioms of q-Hamiltonian G -spaces, : M

644 views • 32 slides

Quantization Noise in Advanced LIGO Digital Control Systems Ayush Pandey Mentors: Christopher

Quantization Noise in Advanced LIGO Digital Control Systems Ayush Pandey Mentors: Christopher Wipf, Jameson Graef Rollins, Rana Adhikari Project Introduction Mixed Signal Systems Digital Control vs Analog Control Quantization

406 views • 36 slides

Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds Ripal Nathuji Aman

Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds Ripal Nathuji Aman Kansal Alireza Ghaffarkhah Presented by Joshua Davis Motivation and Background Cloud computing Off load processing and storage Charged per

549 views • 28 slides

Southern Region CQI Learning Collaborative Webinar San Diego County Profile September 3, 2015

Southern Region CQI Learning Collaborative Webinar San Diego County Profile September 3, 2015 11:30 am 1:00 pm If you do not have speakers or a headset Dial 1-877-873-8017 to participate by teleconference Use the Access Code 4732345# Please

482 views • 20 slides

Demonstrating the Impact of Service High Quality Performance Measures Overview of the e-Course

High Quality Performance Measures Demonstrating the Impact of Service High Quality Performance Measures Overview of the e-Course Series AmeriCorps Prohibited Activities AmeriCorps Allowable/Unallowable Activities Demonstrating the

416 views • 20 slides

Implementing DNNs What this lecture is about: on Embedded Overview of frameworks for

This lecture Implementing DNNs What this lecture is about: on Embedded Overview of frameworks for implementing DNNs with hardware acceleration on GPU-based embedded platforms A deep dive into TensorRT , the state-of-the-art

367 views • 10 slides

Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 ,

Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 , Renkun Ni 2 , Jianguo Li 3 , Yurong Chen 3 , Jun Zhu 1 , Hang Su 1 1 Department of CST, Tsinghua University 2 University of Virginia 3 Intel Labs China

321 views • 14 slides

Obstacles to the quantization of general relativity using symplectic structures Tom McClain

Obstacles to the quantization of general relativity using symplectic structures Tom McClain Department of Physics and Engineering, Washington and Lee University Overview The problem Classical field theory with symplectic structures

348 views • 24 slides

Ultra-low-bit Neural Network Quantization Peisong Wang Institute of Automation, Chinese Academy

Ultra-low-bit Neural Network Quantization Peisong Wang Institute of Automation, Chinese Academy of Sciences 2020.06.03 Collaborators: Weixiang Xu Tianli Zhao Fanrong Li Xiangyu He Gang Li Jian Cheng Cong Leng 6/5/20 1

2.83k views • 32 slides

A First Course in Digital Communications Ha H. Nguyen and E. Shwedyk February 2009 A First

Chapter 4: Sampling and Quantization A First Course in Digital Communications Ha H. Nguyen and E. Shwedyk February 2009 A First Course in Digital Communications 1/41 Chapter 4: Sampling and Quantization Introduction Though many message

695 views • 41 slides