CS 744: RAY
Shivaram Venkataraman Fall 2020
a.1!
CS 744: RAY Shivaram Venkataraman Fall 2020 ADMINISTRIVIA late - - PowerPoint PPT Presentation
a. 1 ! CS 744: RAY Shivaram Venkataraman Fall 2020 ADMINISTRIVIA late mall week - Assignment Grades? by next - Project proposal aka Introduction (10/16) Introduction Related Work Timeline (with eval plan) week . next -
CS 744: RAY
Shivaram Venkataraman Fall 2020
a.1!
ADMINISTRIVIA
Introduction Related Work Timeline (with eval plan)
→
by
mall
late
next
week
→ early
next
week
.MACHINE LEARNING: STACK
→
ypytorch
=Data parallelism
train
→
,y→
Performance
Portability
TVM
D
\
Pipedream
Pipeline
parallelism
'm⇒
REINFORCEMENT LEARNING
Reward
Affeldt
wt
by
. mmalgo
"
n;÷:i
tea"
RL SETUP
:*
→ perforation -7
. -
rein
at
Imitation
.!qm
:*
RL REQUIREMENTS
Simulation Training Serving
O O¥
.
static exeat'm plan
→
fine
grained
computation
→ flexibility
each
simulation
could
be
~
ms
hours
stateless
processing
stated
,dm!nFu7e
, simeator state I
↳ data pre
→ very
low
latency
execution
↳ future computation
very
High throughput
)
depends
past
compute
IM
tasks
I see
RAY API
Tasks Actors
futures = f.remote(args) actor = Class.remote(args) futures = actor.method.remote(args)
ready = ray.wait(futures, k,timeout)
a' ←
Outboxqdfeadegr.am?t:aFIImtereqg-8osl9
" @yeifzI÷mg↳ :
That
tasks
Tfeu:EE
:b
variable I
¥
,I¥mItfauI
Fwd
name
method
. remote Camp't)
not 8.I
→ -
I -
arap will
. handled(or
war #before
futures
can bearguments
to
tasks
args1#
↳
you
canspawn ( or
wait) for
tasks
within
a task
RAY API
Tasks Actors
futures = f.remote(args) actor = Class.remote(args) futures = actor.method.remote(args)
ready = ray.wait(futures, k,timeout)
tasks
'""
"" /
def
f Largs)
:i
in
Ito lo :
fo . g.
remote ( 443)
Hof
. remote Ci )EP
' "Frigg.
to
rmthprtati
"
COMPUTATION MODEL
Lineage !
g.
create - policy
. remoteDotted
lines
↳
Control
edges
spawn
a task C >@
get C)①
Sold
lines
✓
✓
↳
Data
edges
⇒
stateful
edges /
action
arts
happen
sequentially
ARCHITECTURE
Deterministic hash
key, had
= makecan
Mtn
state.gr
tasks
I
1
Fagin
← idea:
:
"anger 01¥
Emir:&
.Global control store
Object table Task table Function table
Sort of
a
Database
Externalizes
↳
list r
all
state !
donations )
Namde
metadata
and
their
↳ shard
↳
Lineage
tasks
Replicate
Scale more
↳
code
blocks corresponding
easily , simplify
ached
design
1
To tasks
fault
tolerance
RAY SCHEDULER
Global Scheduler Global Control Store
Can local scheduler
take locality ?
§
. remotelags )
.
(
if busy
,wait for timeout
g.*
&
fund
:Fasching
←
length
to
locality
determine if
node is busy
FAULT TOLERANCE
Tasks Actors GCS Scheduler
→
lineage
,replay
tasks
periodically
→
checkpoint
actors
tree
restore
cleft replay
messages
→ 775ha 'd
I
replication
drain
→
stateless ! Nothing ? Re -spawn
new scheduler
SUMMARY
Ray: Unified system for ML training, serving, simulation Flexible API with support for Stateless tasks Stateful Actors Distributed scheduling, Global control store
DISCUSSION
https://forms.gle/PN5FSJB6vVkDjoih8
Consider you are implementing two apps: a deep learning model training and a sorting application. When will use tasks vs actors and why ?
stateless
,bad
state
Actors
Tasks
location
Sorting
Does
external Deterministic
unity
→ it
still
have
dependencies
!
* +
into smallerparts?
Model
weights
are
can
do
dependencies Training
state ,
Multiple for
between
iterators ?
data parallel
fine
recovery
✓
YET
replica
new
node
→
has
better
:*:*.
doin
/
we# after
gift
v
)
recovery ?
godwin
?
ageing Minkowski
NEXT STEPS
Next class: Clipper Last lecture on ML!
Linear
scalability
sillier
super
linear
l2hB/mc
hardware
'I:::÷;L I