CS 744: TVM - Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - PowerPoint PPT Presentation

Machine Virtual x.dk ! Tensor L , Llvm T CS 744: TVM - Shivaram Venkataraman Fall 2020

ADMINISTRIVIA Assignment - Course project titles → - Project proposal aka Introduction (10/16) < Introduction ] → writeup page 2 Related Work Timeline (with eval plan) - Midterm: Oct 22

MACHINE LEARNING: STACK Distributed no owed ✓ efferent just lie Train ↳ forward pass → Interplay & inference quondam \ training ( distributed , dealing inference make easy groin v and saddle Hardware

⇒ ⇒ MOTIVATION: PERFORAMNCE PORTABILITY model file : www.rayf/4TE-iTIy Pytoreh → " intent :?÷ confute primitives cow matrix ed multiply 1 performance high I want you back ends hardware across - vendor specific • q o Dependence on o libraries available operators models fast new operators Y evolve Not combination of vendor ML new existing in libraries

. ) AM describes code Python → model ML Tvm → . = file that → Binary on hardware runs

¥÷÷÷¥÷÷÷÷÷ :* " OPTIMIZATION COMPUTATION GRAPHS [ I " ÷ . :÷÷÷ : Operator Fusion - T " operators " 1-1 → map , ↳ ( Spg - - reduction , scaling after Sum → - Data layout Major , column - Major Kow is represented .in/TEHtEi/:::IS...:: : , , Infest Blocked NN 2- layer g teat layout as

↳ TENSOR EXPRESSION LANGUAGE operator cry language expression expressed tensor in ↳ tensor ) math operations Common Arithmetic, Math operations Know the shape of the output and the data accessed

anime fi Halide → ← gu expression OpenMP CODE GENERATION + of imtmhns :÷÷÷÷÷÷:÷÷i÷÷÷ ead does a. ← Nested parallelism " l : " in for i ' jaihe t :S in for j , Bs tdgmptd.im#isIL;i!dffIHdeu- poker . threads use as =doopiterah# can the Tensorization is , store , add whet bad instruction → hardware set - to Allows = you operator register - - Extensible ! intrinsic

as Some Latency HIDING etc . Pytorch 9 What is the goal? and computation ↳ Overlap communication utilizes that Schedule & bandwidth - memory units compute ig :* year 'fad

AUTOMATING OPTIMIZATION Goal: Create a specialized operator for input shape and layout - - - Challenge: choices & Choose appropriate schedule optimizations parameters Hunts different Tiling size, loop unrolling lots of - - - - r . lots of also choose . to Automate the optimizer! Fim m " " . Ml ? - I what configurations to Try ←

ML-Based Cost model to Machine Learning Model Design Choices as n seconds Speed: Faster than time it takes to evaluate a config → → Quality: Use a rank objective to predict the relative order of runtime take generate code Gradient tree boosting model ← d memory access count features reuse ratio of each memory buffer at each loop level one-hot encoding of loop annotations

model perf using ← ML-BASED COST MODEL when config tom > LE , - y 20ms ( Cz , Iteration candidate 8ms > is ccz wyignratim l ahhhh each Select a batch of candidates < £ , 41 → ' fail > fief a Collect data , a : ← harder :c , trashy data Use as training data to update the model above → step a) toaB7 ✓ How to select candidates? - Parallel Simulated Annealing ' better than b is cj model Task ↳ → ↳ Start from a random config Aa cluster d , , ~ on Walk to a nearby config à & try go → Yes another Successful if cost decreases Else Reject → generate oyer config No

Distributed device pool Pool of devices to speed up profiling RPC interface to run a trial on device Share device pools for multiple graphs

SUMMARY TVM: Compiler for ML inference models Support high performance for range of models, hardware devices Key ideas → operator fusion Graph-level optimizations → Tensor expression language: Code-gen, Latency hiding etc → ML based Cost Model for automation -

DISCUSSION https://forms.gle/WiVgJ3abGXXgfBN99

↳ Consider that you are building an optimizer for Spark programs instead of ML inference. What would be some configuration knobs that you could similarly tune? What might be different from the TVM optimizer? hiding latency Similar logic → communication overlap comp , 7rYYdimemim'operatorfmon → mapBoperahn access patterns ↳ laa# user defined operators are - challenging ? ? automate Partitioning can you → partitions / co - partitioning \ ↳ number of space ! config performance ! Had . cache → manually insert Persistence →

What is your takeaway from the following figure? on fasting qm " bae : Enea . :c . . → f- T honey ! week , unite :b . or

NEXT STEPS spark ? hiding in latency |D÷i;: ✓ rddlsmaf tasks > Next class: Ray D Course project: Oct 16 (introductions) credence tasks Midterm: Oct 22 ← Hanf files D map rddz : comm Edna > ← no wait D

CS 744: TVM - Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - PowerPoint PPT Presentation

Machine Virtual x.dk ! Tensor L , Llvm T CS 744: TVM - Shivaram Venkataraman Fall 2020 ADMINISTRIVIA Assignment - Course project titles - Project proposal aka Introduction (10/16) < Introduction ] writeup page 2

TVM at Facebook Lots of contributors at FB and elsewhere TVM at Facebook Why TVM? Examples from

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

December 12, 2018 Luis Ceze Welcome to the 1st TVM and Deep Learning Compilation Conference!

TVM TVM f for ed or edge c e com omputin ting p g pla latf tform orm NTT Software Inno

TVM Deep Learning on Bare-Metal Devices Pratyush Patel No OS stack Extend TVM to support

TVM @ FB Andrew Tulloch Research Scientist Background Excited to be here! Lots of FB

Phone Fax 25448 SEIL ROAD 1-815-744-1910 1-815-744-1968 SHOREWOOD, ILLINOIS 60404-7620

TVM & THE APACHE SOFTWARE FOUNDATION MARKUS WEIMER MEMBER, APACHE SOFTWARE FOUNDATION

2.744 Dreamweaver Tutorial Sangmok Han sangmok@mit.edu Feb 24, 2010 Overview We will go over

Val alue ue Man anag agem emen ent TVM : V VE TQM QM integrati gration on Org rgan

TVM Upgrade Project Citizens Advisory Committee July 15, 2020 Agenda Item 8 Scope of Work 1.

DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, Jing-Jia Liou, Chih-Tsun Huang,

Extending TVM with Dynamic Execution Jared Roesch and Haichen Shen Outline Motivation for

Matching Scores TVM, Session 4 CS6200: Information Retrieval Slides by: Jesse Anderton Finding

Secure and efficient deep learning everywhere Octomizer Outline Who we are (recap) Deployment

System Introduction to Sensory Physiology: Sensory- Motor System General Properties of

Phase 2a/b trials /////////// Stefan Klein 07/12/2018 Agenda Phase 2a: PoC studies Phase 2b:

ImPACT Performance Agency Evaluation Group Assessment of CT Scanners CT dosimetry and a data

THE SITUATIONS OF EXTERNAL AND INTERNAL RADIATION EXPOSURE IN FUKUSHIMA Fukushima Senior High

Roger-the-Crab Rod Grupen Department of Computer Science University of Massachusetts Amherst

Parent Education Night Capilano Elementary School Snapshot of Our Evening Welcome Carson Graham

Design Principles chiefly but not exclusively object-oriented Objects, classes, and modules

Biology 2331 Anatomy and Physiology I "If you want something you've never had, then you've

Sambuz

Useful Links

Newsletter

Mail Us

CS 744: TVM - Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - PowerPoint PPT Presentation

Machine Virtual x.dk ! Tensor L , Llvm T CS 744: TVM - Shivaram Venkataraman Fall 2020 ADMINISTRIVIA Assignment - Course project titles - Project proposal aka Introduction (10/16) < Introduction ] writeup page 2

TVM at Facebook Lots of contributors at FB and elsewhere TVM at Facebook Why TVM? Examples from

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

VTA: Open &amp; Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

December 12, 2018 Luis Ceze Welcome to the 1st TVM and Deep Learning Compilation Conference!

TVM TVM f for ed or edge c e com omputin ting p g pla latf tform orm NTT Software Inno

TVM Deep Learning on Bare-Metal Devices Pratyush Patel No OS stack Extend TVM to support

TVM @ FB Andrew Tulloch Research Scientist Background Excited to be here! Lots of FB

Phone Fax 25448 SEIL ROAD 1-815-744-1910 1-815-744-1968 SHOREWOOD, ILLINOIS 60404-7620

TVM &amp; THE APACHE SOFTWARE FOUNDATION MARKUS WEIMER MEMBER, APACHE SOFTWARE FOUNDATION

2.744 Dreamweaver Tutorial Sangmok Han sangmok@mit.edu Feb 24, 2010 Overview We will go over

Val alue ue Man anag agem emen ent TVM : V VE TQM QM integrati gration on Org rgan

TVM Upgrade Project Citizens Advisory Committee July 15, 2020 Agenda Item 8 Scope of Work 1.

DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, Jing-Jia Liou, Chih-Tsun Huang,

Extending TVM with Dynamic Execution Jared Roesch and Haichen Shen Outline Motivation for

Matching Scores TVM, Session 4 CS6200: Information Retrieval Slides by: Jesse Anderton Finding

Secure and efficient deep learning everywhere Octomizer Outline Who we are (recap) Deployment

System Introduction to Sensory Physiology: Sensory- Motor System General Properties of

Phase 2a/b trials /////////// Stefan Klein 07/12/2018 Agenda Phase 2a: PoC studies Phase 2b:

ImPACT Performance Agency Evaluation Group Assessment of CT Scanners CT dosimetry and a data

THE SITUATIONS OF EXTERNAL AND INTERNAL RADIATION EXPOSURE IN FUKUSHIMA Fukushima Senior High

Roger-the-Crab Rod Grupen Department of Computer Science University of Massachusetts Amherst

Parent Education Night Capilano Elementary School Snapshot of Our Evening Welcome Carson Graham

Design Principles chiefly but not exclusively object-oriented Objects, classes, and modules

Biology 2331 Anatomy and Physiology I &quot;If you want something you've never had, then you've

Sambuz

Useful Links

Newsletter

Mail Us

VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

TVM & THE APACHE SOFTWARE FOUNDATION MARKUS WEIMER MEMBER, APACHE SOFTWARE FOUNDATION

Biology 2331 Anatomy and Physiology I "If you want something you've never had, then you've