CS 744: TVM
Shivaram Venkataraman Fall 2020
TensorVirtual
Machinex.dk!
L,
Llvm
T
CS 744: TVM - Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - - PowerPoint PPT Presentation
Machine Virtual x.dk ! Tensor L , Llvm T CS 744: TVM - Shivaram Venkataraman Fall 2020 ADMINISTRIVIA Assignment - Course project titles - Project proposal aka Introduction (10/16) < Introduction ] writeup page 2
CS 744: TVM
Shivaram Venkataraman Fall 2020
TensorVirtual
Machinex.dk!
L,
Llvm
T
ADMINISTRIVIA
Introduction Related Work Timeline (with eval plan)
Assignment
→
<]→
2 page writeupMACHINE LEARNING: STACK
✓
Distributed
no owed Trainefferent
just lie ↳ forward pass→ Interplay
inference &quondam \
, training makedistributed
easy(
dealing inferencegroin
v Hardware and saddleMOTIVATION: PERFORAMNCE PORTABILITY
Pytoreh → model file "intent :?÷:www.rayf/4TE-iTIy
confute primitives matrix cow multiply ed you want high performance I 1 across hardware backendsavailable
in existing vendor librariesAM
→ Python code describes ML model → Tvm. . )
=
→ Binary
file that
runsOPTIMIZATION COMPUTATION GRAPHS
Operator Fusion Data layout[ I
¥÷÷÷¥÷÷÷÷÷:* "
"÷ . :÷÷÷:
Major ,
Blockedg
, Infest teat is represented 2- layer NN.in/TEHtEi/:::IS...:: :
as layoutTENSOR EXPRESSION LANGUAGE
Common Arithmetic, Math operations Know the shape of the output and the data accessed↳
expressed
in tensor expressionlanguage
↳tensor)
math operationsCODE GENERATION
Nested parallelism TensorizationHalide →
expression
OpenMP ← gu+ of imtmhns
ead doesanime fi
:÷÷÷÷÷÷:÷÷i÷÷÷
'jaihe " for i in l : " for j . in t :S threads can use as , Bs tdgmptd.im#isIL;i!dffIHdeu- poker=doopiterah#
bad , store, add → whet is the hardware instruction setExtensible !
intrinsic
Latency HIDING
What is the goal?
Some
asPytorch
etc .9
↳ Overlapcomputation
and communication Schedule thatutilizes
bandwidth &
compute unitsig:*
year 'fad
AUTOMATING OPTIMIZATION
Goal: Create a specialized operator for input shape and layout Challenge: Choose appropriate schedule optimizations Tiling size, loop unrolling Automate the optimizer!
lots
different
choices & also lots ofparameters Hunts
to choose .Fim
Ml ?
m
. what configurations←
ML-Based Cost model
Machine Learning Model Design Choices Speed: Faster than time it takes to evaluate a config Quality: Use a rank objective to predict the relative order of runtime Gradient tree boosting model memory access count reuse ratio of each memory buffer at each loop level
generate
d
features
ML-BASED COST MODEL
Iteration Select a batch of candidates Collect data Use as training data to update the model How to select candidates? Parallel Simulated Annealing Start from a random config Walk to a nearby config à Successful if cost decreases Else Reject model perf when using ← config , tom >LE
→
eachcandidate
is ccz , 8ms> 41 awyignratim l ahhhh
< £ , 'fail>fief
a ← :harder:c , → step a)
abovetrashy data
~
, Aa↳ → ↳
'Task
model is cj better than b Yes → go &try d,Distributed device pool
Pool of devices to speed up profiling RPC interface to run a trial on device Share device pools for multiple graphs
SUMMARY
TVM: Compiler for ML inference models Support high performance for range of models, hardware devices Key ideas Graph-level optimizations Tensor expression language: Code-gen, Latency hiding etc ML based Cost Model for automation
→ → operator fusion →DISCUSSION
https://forms.gle/WiVgJ3abGXXgfBN99Consider that you are building an optimizer for Spark programs instead of ML
tune? What might be different from the TVM optimizer? Similar
logic
→latency
hiding
7rYYdimemim'operatorfmon→mapBoperahn
access patterns↳laa#
↳
challenging
??partitions / co- partitioning
\
performance ! Had . cache → config space ! Persistence →manually
insert→
fastingqm
" bae:Enea.:c. .
honey!
week
, unite:b .NEXT STEPS
Next class: Ray Course project: Oct 16 (introductions) Midterm: Oct 22
latencyhiding
inspark ?
D
rddlsmaf tasks > ✓ credence tasksD
rddz map ← Hanffiles :Edna > ←
no commD
wait