PLACETO: LEARNING GENERALIZABLE DEVICE PLACEMENT ALGORITHMS FOR - - PowerPoint PPT Presentation

placeto learning generalizable device placement
SMART_READER_LITE
LIVE PREVIEW

PLACETO: LEARNING GENERALIZABLE DEVICE PLACEMENT ALGORITHMS FOR - - PowerPoint PPT Presentation

PLACETO: LEARNING GENERALIZABLE DEVICE PLACEMENT ALGORITHMS FOR DISTRIBUTED MACHINE LEARNING Ravi vichandra Ad Addanki, , Sh Shaileshh Bo Bojja jja Ve Venkatakrishnan, , Sh Shreyan Gupta, , Ho Hongzi Mao, , Mohammad A Alizadeh


slide-1
SLIDE 1

PLACETO: LEARNING GENERALIZABLE DEVICE PLACEMENT ALGORITHMS FOR DISTRIBUTED MACHINE LEARNING

Ravi vichandra Ad Addanki, , Sh Shaileshh Bo Bojja jja Ve Venkatakrishnan, , Sh Shreyan Gupta, , Ho Hongzi Mao, , Mohammad A Alizadeh Presented by: Obodoekwe Nnaemeka

slide-2
SLIDE 2

Problem

Distributed training (GPU and CPU) Human experts? Reinforcement learning?

slide-3
SLIDE 3

Problem

Sometimes tolerable. Solutions do not generalize The optimization is done for a single graph. Single computational graph vs Class of computational graph

slide-4
SLIDE 4

Placeto

Efficien ency

Sequence of iterative placement improvements

Gen ener eralizability

NN architecture that uses graph embedding to encode the computation of graph structure in the placement policy.

slide-5
SLIDE 5

Learning method

■ Markov Decision Process

slide-6
SLIDE 6

POLICY NETWORK ARCHITECTURE

slide-7
SLIDE 7

GRAPH EMBEDDING

slide-8
SLIDE 8

Training Details

Colocation Simulator

slide-9
SLIDE 9

Experimentation

Deep learning models (Incep eption

  • n-V3

V3, N , NAS ASNet, N t, NMT MT) Synthetic data (cifar10, ptb, nmt) Single GPU, Scotch, Human Expert, RNN based approach.

slide-10
SLIDE 10

Result

Performance Generalizability

slide-11
SLIDE 11

PLACETO VS RNN

slide-12
SLIDE 12

GENERALIZABILITY

slide-13
SLIDE 13

GENERALIZABILITY

slide-14
SLIDE 14

Place deep dive

Node traversal

  • rder

Alternative architectures Simple aggregator Simple partitioner

slide-15
SLIDE 15

Critic

+ First attempt to generalize device placement using a graph embedding network + Really Impressive performance

  • Only optimizes placement decisions
  • It shows generalization to unseen graphs, but they are generated artificially by

architecture search for a single learning task and dataset. How does the framework handle failure. Evaluation protocol needs to be more explicit.