DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, - PowerPoint PPT Presentation

Dec 25, 2023 •169 likes •233 views

DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, Jing-Jia Liou, Chih-Tsun Huang, Wei-Chung Hsu & Juin-Ming Lu National Tsing Hua University & Industrial Technology Research Institute Dec 5th, 2019 1 DRAM Access Consumes

DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, Jing-Jia Liou, Chih-Tsun Huang, Wei-Chung Hsu & Juin-Ming Lu National Tsing Hua University & Industrial Technology Research Institute Dec 5th, 2019 1
DRAM Access Consumes More Energy • Energy efficiency is the key to DNN computation • Hardware accelerators • DRAM consumes 50-100x more energy per byte than SRAM • Node fusion is used to save DRAM accesses DRAM SRAM Register Energy 250x 4x 1x 2
TVM only Fuses Elementwise OP BatchNorm Elementwise TopLevel Relu Conv TVMOP Elementwise OutElementwieFusable • Currently, TVM only supports fusion of elementwise OP into Conv • Each OP has an attribute to indicate whether to fuse • Generate TVMOP, which includes nodes to share data in SRAM 3
Our Node Fusion Merges Multiple Convs Fusion Fus Tensor data Te 1 st 2 nd 1 st 2 nd DNN DRAM DRAM DRAM DRAM DRAM Operator SRAM for ( n = 0 ; n < N ; n ++) for ( n = 0 ; n < N ; n ++) # 1st Conv for ( k = 0 ; k < C2 ; k ++) for ( k = 0 ; k < C1 ; k ++) for ( y = 0 ; y < H2 ; y ++) for ( y = 0 ; y < H1 ; y ++) for ( x = 0 ; x < W2 ; x ++) for ( x = 0 ; x < W1 ; x ++) # Internal SRAM buffer int sram [ C1 ][ R2 ][ S2 ] for ( c = 0 ; c < C0 ; c ++) for ( r = 0 ; r < R1 ; r ++) for ( c = 0 ; c < C1 ; c1 ++) for ( s = 0 ; s < S1 ; s ++) for ( r = 0 ; r < R2 ; r ++) O1 [ n ][ k ][ y ][ x ] += W1 [ k ][ c ][ r ][ s ] * I [ n ][ c ][ y + r ][ x + s ] for ( s = 0 ; s < S2 ; s ++) for ( c2 = 0 ; c2 < C0 ; c ++) for ( n = 0 ; n < N ; n ++) # 2nd Conv for ( r2 = 0 ; r2 < R1 ; r ++) for ( k = 0 ; k < C2 ; k ++) for ( s2 = 0 ; s2 < S1 ; s ++) for ( y = 0 ; y < H2 ; y ++) sram [ c ][ r ][ s ] += W1 [ c ][ c2 ][ r2 ][ s2 ] * I [ n ][ c2 ][ y + r + r2 ][ x + s + s2 ] for ( x = 0 ; x < W2 ; x ++) for ( c = 0 ; c < C1 ; c ++) for ( c = 0 ; c < C1 ; c ++) for ( r = 0 ; r < R2 ; r ++) for ( r = 0 ; r < R2 ; r ++) for ( s = 0 ; s < S2 ; s ++) for ( s = 0 ; s < S2 ; s ++) O2 [ n ][ k ][ y ][ x ] += W2 [ k ][ c ][ r ][ s ] * O1 [ n ][ c ][ y + r ][ x + s ] O [ n ][ k ][ y ][ x ] += W2 [ k ][ c ][ r ][ s ] * sram [ c ][ r ][ s ] 4
Experiment Settings: Hardware Controller • Eyeriss-like architecture ifmap • 256MB DRAM PE PE PE ... PE weights • 108KB SRAM ipsum Buffer PE PE PE ... PE • 12x14 PE ... opsum ... ... ... • Runs AlexNet PE PE PE ... PE • Due to hardware limitation, only Conv is DRAM evaluated 5
Experimental Results Energy (mJ) MCycle Energy-Delay (KCycle.J) 5 7 35 4.5 16% 6 30 23% 4 5 40% 3.5 25 3 4 20 2.5 3 15 2 1.5 10 2 1 5 1 0.5 0 0 0 Engergy*Cycle Energy Cycle w/o Fusion Fusion w/o Fusion Fusion w/o Fusion Fusion 6

Recommend

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node 1.1 node 1.2 node 1 node 1.3 branch 1 node 1.4 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node 1.1 node 1.2 node 1 node

269 views • 5 slides

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor Node Tor Node Tor Node Tor Node Tor Node Tor Node Tor Node Tor Node Tor Node Tor Node Tor Network Encrypted tunnel Web server Unencrypted

575 views • 36 slides

TVM at Facebook Lots of contributors at FB and elsewhere TVM at Facebook Why TVM? Examples from

TVM at Facebook Lots of contributors at FB and elsewhere TVM at Facebook Why TVM? Examples from Speech Synthesis Sparsity PyTorch Why TVM for ML Systems? - Performance matters - Flexibility matters - Portability matters ML Systems at

679 views • 29 slides

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is Quantization? source: Han et al Converting weight value to low-bit integer like 8bit precision from float-point without significant accuracy drop.

421 views • 7 slides

VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack High-Level Differentiable IR Tensor Expression IR LLVM CUDA Metal TVM Stack High-Level Differentiable IR Tensor Expression IR LLVM CUDA Metal

1.01k views • 84 slides

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Data Fusion under . . . Data Fusion under . . . New Problem: . . . Why This Is Important New Idea: Model Fusion Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion: Interval . . . Model Fusion:

503 views • 13 slides

Large Scale DRAM Model DRAM Engineers DRAM Engineers Team: Abdulrahman Alqahtani,

Large Scale DRAM Model DRAM Engineers DRAM Engineers Team: Abdulrahman Alqahtani, Demetria Shepherd, Colby Weber, Jinming Yang, Zeyu Zhang Client: Daniel Eichenberger, Microns Test Engineer/Recruiter Capstone Mentor: Ashwija

306 views • 8 slides

December 12, 2018 Luis Ceze Welcome to the 1st TVM and Deep Learning Compilation Conference!

1st TVM and Deep Learning Compilation Conference December 12, 2018 Luis Ceze Welcome to the 1st TVM and Deep Learning Compilation Conference! Welcome to the 1st TVM and Deep Learning Compilation Conference! 180+ ppl! Machine learning is

1.31k views • 115 slides

TVM TVM f for ed or edge c e com omputin ting p g pla latf tform orm NTT Software Inno

TVM TVM f for ed or edge c e com omputin ting p g pla latf tform orm NTT Software Inno nnovation n Ce Center Ka Kazutaka Mo Morita In Inference in 5G era Edge Devices Offload MEC (Mobile edge computing) server Offload

181 views • 7 slides

TVM Deep Learning on Bare-Metal Devices Pratyush Patel No OS stack Extend TVM to support

TVM Deep Learning on Bare-Metal Devices Pratyush Patel No OS stack Extend TVM to support bare-metal devices Optimization High-Level Differentiable IR AutoTVM Tensor Expression IR LLVM, CUDA VTA AutoVTA Hardware FPGA ASIC Fleet

1.46k views • 18 slides

TVM @ FB Andrew Tulloch Research Scientist Background Excited to be here! Lots of FB

TVM @ FB Andrew Tulloch Research Scientist Background Excited to be here! Lots of FB folks in the audience Working in TVM since ~June Focusing on apply TVM to accelerate ML inference on CPUs/GPUs across mobile and server

863 views • 24 slides

1 Agenda Quick'Intro' Node.js:'The'Beginning' What'Is'Node.js? Why'Use'Node.js?

1 Agenda Quick'Intro' Node.js:'The'Beginning' What'Is'Node.js? Why'Use'Node.js? Installing'Node.js 2 Node.js is a platform for building applications What%makes%Node.js%so%special?

808 views • 33 slides

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Motivation of Fusion Frames What is Fusion Frame? The fusion frame formulation of multi-camera image fusion Algorithms Nume High resolution image fusion via fusion frames Shidong Li San Francisco State University jointly with Zhenjie Yao

1k views • 67 slides

Node.js Workshop Tom Hughes-Croucher Chief Evangelist / Node Tech Lead @sh1mmer tom@joyent.com

Node.js Workshop Tom Hughes-Croucher Chief Evangelist / Node Tech Lead @sh1mmer tom@joyent.com Overview Introduction Why Server-Side JavaScript? What is Node? Using Node Understanding Node Node Ecosystem Programming

2.1k views • 173 slides

$Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node->m_data == value) {$

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node->m_data == value) {

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node->m_data == value) { ! return node; ! Left & right pointers } else if (node->m_data < value){ ! Integer value keys CIS 371 node = node->m_right; !

558 views • 17 slides

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO FUSION TYPES NEUTRONIC ANEUTRONIC produces neutrons produces NO neutrons NEUTRONIC FUSION D+T ->

1.68k views • 155 slides

MA CHIA DATA COLLECTION & SHARING PRACTICES Kathy Hines Senior Director of Partner

DE-IDENTIFICATION ASSESSMENT OF MA CHIA DATA COLLECTION & SHARING PRACTICES Kathy Hines Senior Director of Partner Operations and Data Compliance, Massachusetts Center for Health Information and Analysis (MA CHIA) Samuel Chick Data &

292 views • 17 slides

The Massachusetts Health Connector Massachusetts Health Policy Forum 2019 Student Forum MARISS

The Massachusetts Health Connector Massachusetts Health Policy Forum 2019 Student Forum MARISS SSA W WOLTM TMANN Dire rect ctor o r of P Poli licy cy and nd Appli pplied R Research ch Todays Focus Background on the Health

373 views • 23 slides

ANALYSIS Ray Campbell January 10, 2019 CENTER FOR HEALTH INFORMATION AND ANALYSIS CHIAs

OVERVIEW OF THE CENTER FOR HEALTH INFORMATION AND ANALYSIS Ray Campbell January 10, 2019 CENTER FOR HEALTH INFORMATION AND ANALYSIS CHIAs Mission and Vision CHIA has extensive authority to compel the submission of data from

515 views • 12 slides

CHIA OVERVIEW Ray Campbell, Executive Director April 5, 2018 CENTER FOR HEALTH INFORMATION AND

CHIA OVERVIEW Ray Campbell, Executive Director April 5, 2018 CENTER FOR HEALTH INFORMATION AND ANALYSIS CHIA Overview: Data Assets and Uses 2 CHIA Oversight Council | Ray Campbell, Executive Director | 3/27/18 CHIA Overview: Stakeholders

275 views • 3 slides

The independent validation of your health informatics and digital health skills

The independent validation of your health informatics and digital health skills www.healthinformaticscertification.com A workforce confidently using digital health technologies to deliver health and care. Australian Digital Health Agency

377 views • 17 slides

Verifiable Delay Functions: How to Slow Things Down (Verifiably) Dan Boneh Stanford University

NutMiC19, June, 2019 Verifiable Delay Functions: How to Slow Things Down (Verifiably) Dan Boneh Stanford University What is a VDF? (verifiable delay function) Intuition: a function X Y that (1) takes time T to evaluate, even with

694 views • 30 slides

Introduction to hardware design of block ciphers Francesco Regazzoni Francesco Regazzoni 20

Introduction to hardware design of block ciphers Francesco Regazzoni Francesco Regazzoni 20 October 2015, Chia, Italy P. 1 Contents 1 Hardware Design 2 ASIC 3 Reconfigurable Devices Francesco Regazzoni 20 October 2015, Chia, Italy P. 2

516 views • 36 slides

Welcome to CSE 506 Introduc)on & Review Don Porter 1 CSE 506: Opera.ng Systems Why Grad

CSE 506: Opera.ng Systems Welcome to CSE 506 Introduc)on & Review Don Porter 1 CSE 506: Opera.ng Systems Why Grad OS? Primary Goal: Demys)fy how computers work 2 CSE 506: Opera.ng Systems An example progression Undergrad OS:

603 views • 31 slides