Device Placement Optimization with Reinforcement Learning Azalia - - PowerPoint PPT Presentation

device placement optimization with reinforcement learning
SMART_READER_LITE
LIVE PREVIEW

Device Placement Optimization with Reinforcement Learning Azalia - - PowerPoint PPT Presentation

Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jefg Dean 1 / 21 What is device placement Consider a


slide-1
SLIDE 1

1 / 21

Device Placement Optimization with Reinforcement Learning

Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jefg Dean

slide-2
SLIDE 2

2 / 21

What is device placement

  • Consider a T

ensorFlow computational graph G, which consists of M operations {o1,o2, …, oM }, and a list of D available devices.

  • A placement P = {p1,p2, …, pM} is an assignment of an
  • peration oi to a device pi.
slide-3
SLIDE 3

3 / 21

Why device placement

  • Trend toward many-device training, bigger models, larger

batch sizes

  • Growth in size and computational requirements of training

and inference

slide-4
SLIDE 4

4 / 21

Typical approaches

  • Use a heterogeneous distributed environment with a mixture
  • f many CPUs and GPUs
  • Often based on greedy heuristics
  • Require deep understanding of devices: bandwidth, latency

behavior

  • Are not fmexible enough and does not generalize well
slide-5
SLIDE 5

5 / 21

ML for device placement

  • ML is repeatedly replacing rule based heuristics
  • RL can be applied to device placement

– Efgective search across large state and action spaces to

fjnd optimal solution

– Automatic learning from underlying environment only

based on reward function

slide-6
SLIDE 6

6 / 21

RL based device placement

CPU GPU Policy Assignment of ops in Neural model to devices Neural Model Available Devs Input RL model Output Evaluate runtime

slide-7
SLIDE 7

7 / 21

Problem formulation

: expected runtime : trainable parameters of policy : runtime : policy : output placements

slide-8
SLIDE 8

8 / 21

Training with REINFORCE

  • Learn the network parameters using Adam optimizer based
  • n policy gradients computed via the REINFORCE equation:
  • Use K placement samples to estimate policy gradients & use

a baseline term B to reduce variance:

slide-9
SLIDE 9

9 / 21

Model architecture

slide-10
SLIDE 10

10 / 21

Challenges

  • Vanishing
  • Exploding gradient issue
  • Large memory footprints
slide-11
SLIDE 11

11 / 21

Distributed training

slide-12
SLIDE 12

12 / 21

Experiments

  • Recurrent Neural Language Model (RNNLM)
  • Neural Machine Translation with attention mechanism(NMT)
  • Inception-V3
slide-13
SLIDE 13

13 / 21

Learned placement on NMT

slide-14
SLIDE 14

14 / 21

NMT end-to-end runtime

slide-15
SLIDE 15

15 / 21

Learned placement on Inception-V3

slide-16
SLIDE 16

16 / 21

Inception-V3 end-to-end runtime

slide-17
SLIDE 17

17 / 21

Profming on NMT

slide-18
SLIDE 18

18 / 21

Profming on Inception-V3

slide-19
SLIDE 19

19 / 21

Profming on Inception-V3

slide-20
SLIDE 20

20 / 21

Running times (in seconds)

slide-21
SLIDE 21

21 / 21

Summary

  • Propose a RL model to optimize device placements for

neural networks

  • Use policy gradient to learn parameters
  • Policy fjnds non-trival assignment of operations to devices

that outperform heuristic approaches

  • Profjling of results show policy learns implicit trade-ofgs

between computation and communication in hardware