CS 744: RAY Shivaram Venkataraman Fall 2020 ADMINISTRIVIA late - - PowerPoint PPT Presentation

cs 744 ray
SMART_READER_LITE
LIVE PREVIEW

CS 744: RAY Shivaram Venkataraman Fall 2020 ADMINISTRIVIA late - - PowerPoint PPT Presentation

a. 1 ! CS 744: RAY Shivaram Venkataraman Fall 2020 ADMINISTRIVIA late mall week - Assignment Grades? by next - Project proposal aka Introduction (10/16) Introduction Related Work Timeline (with eval plan) week . next -


slide-1
SLIDE 1

CS 744: RAY

Shivaram Venkataraman Fall 2020

a.1!

slide-2
SLIDE 2

ADMINISTRIVIA

  • Assignment Grades?
  • Project proposal aka Introduction (10/16)

Introduction Related Work Timeline (with eval plan)

  • Midterm: Oct 22

by

mall

late

next

week

→ early

next

week

.
slide-3
SLIDE 3

MACHINE LEARNING: STACK

ypytorch

=

Data parallelism

train

,y→

Performance

Portability

TVM

D

\

Pipedream

Pipeline

parallelism

'm⇒

slide-4
SLIDE 4

REINFORCEMENT LEARNING

slide-5
SLIDE 5

Reward

Affeldt

wt

by

. mm

algo

"

slide-6
SLIDE 6

n;÷:i

tea"

slide-7
SLIDE 7

RL SETUP

:*

.in#ja:hna

→ perforation -7

¥

. -

  • re
  • do
  • w
  • -

rein

at

Imitation

.!qm

:*

slide-8
SLIDE 8

RL REQUIREMENTS

Simulation Training Serving

O O

¥

.

static exeat'm plan

fine

grained

computation

→ flexibility

each

simulation

could

be

~

ms

  • r

hours

stateless

processing

stated

,dm!nFu7e

, simeator state I

↳ data pre

  • proving

→ very

low

latency

  • Inane

execution

↳ future computation

very

High throughput

)

depends

  • n outfit
  • f

past

compute

IM

tasks

I see

slide-9
SLIDE 9

RAY API

Tasks Actors

futures = f.remote(args) actor = Class.remote(args) futures = actor.method.remote(args)

  • bjects = ray.get(futures)

ready = ray.wait(futures, k,timeout)

a' ←

Outbox

qdfeadegr.am?t:aFIImtereqg-8osl9

" @yeifzI÷mg↳ :

That

'kiEg¥eµ

tasks

Tfeu:EE

:b

variable I

¥

  • - -

T Im

,I¥mItfauI

Fwd

  • 2mg

name

  • handle
= actor .

method

. remote Camp

't)

not 8.I

→ -

I -

arap will

. handled

(or

war #

before

I

futures

can be

arguments

to

tasks

args1#

you

can

spawn ( or

wait) for

tasks

within

a task

slide-10
SLIDE 10

RAY API

Tasks Actors

futures = f.remote(args) actor = Class.remote(args) futures = actor.method.remote(args)

  • bjects = ray.get(futures)

ready = ray.wait(futures, k,timeout)

  • o

a•img÷

  • Nested

tasks

'
  • \

""

"" /

def

f Largs)

:
  • for

i

in

Ito lo :

fo . g.

remote ( 443)

Hof

. remote Ci )
  • ray
. wait Cfo)
  • wait

EP

' "

Frigg.

to

rmthprtati

"

slide-11
SLIDE 11

COMPUTATION MODEL

Lineage !

g.

create - policy

. remote

Dotted

lines

Control

edges

spawn

a task C >
  • an actor
  • spam

@

get C)①

Sold

lines

Data

edges

  • y

stateful

edges /

X

action

  • n

arts

happen

sequentially

slide-12
SLIDE 12

ARCHITECTURE

Deterministic hash

key, had

= make

can

Mtn

state.gr

tasks

I

to

1

Fagin

← idea:

:

¥

"anger 01¥

Emir:&

.
slide-13
SLIDE 13

Global control store

Object table Task table Function table

Sort of

a

Database

Externalizes

list r

all

  • bjets

state !

donations )

Namde

metadata

and

their

↳ shard

Lineage

  • f

tasks

Replicate

\

Scale more

code

blocks corresponding

easily , simplify

ached

design

1

To tasks

fault

tolerance

slide-14
SLIDE 14

RAY SCHEDULER

Global Scheduler Global Control Store

Can local scheduler

take locality ?

§

. remote

lags )

C

.

(

if busy

,

wait for timeout

g.*

/✓bcdi5

&

fund

:Fasching

¥

length

to

locality

determine if

node is busy

slide-15
SLIDE 15

FAULT TOLERANCE

Tasks Actors GCS Scheduler

lineage

,

replay

  • k
re
  • execution
  • f

tasks

periodically

checkpoint

actors

tree

restore

cleft replay

messages

→ 775ha 'd

I

replication

drain

stateless ! Nothing ? Re -spawn

  • r launch
a

new scheduler

slide-16
SLIDE 16

SUMMARY

Ray: Unified system for ML training, serving, simulation Flexible API with support for Stateless tasks Stateful Actors Distributed scheduling, Global control store

slide-17
SLIDE 17

DISCUSSION

https://forms.gle/PN5FSJB6vVkDjoih8

slide-18
SLIDE 18

Consider you are implementing two apps: a deep learning model training and a sorting application. When will use tasks vs actors and why ?

stateless

,

bad

  • balancing

state

Actors

Tasks

location

Sorting

Does

external Deterministic

  • perations

unity

→ it

still

have

dependencies

!

* +

  • a. d
'ride

into smallerparts?

Model

weights

are

can

do

dependencies Training

state ,

Multiple for

between

iterators ?

data parallel

fine

  • grained

recovery

slide-19
SLIDE 19

YET

replica

new

node

has

better

if

;÷÷£

:*:*.

  • to replica ?
n
  • goes

doin

/

we# after

gift

v

)

recovery ?

godwin

?

ageing Minkowski

slide-20
SLIDE 20

NEXT STEPS

Next class: Clipper Last lecture on ML!

Linear

scalability

sillier

¢

a

super

linear

l2hB/mc

hardware

'I:::÷;L I