Modern Systems for Neural Networks Valentin Dalibard This talk - - PowerPoint PPT Presentation

modern systems for neural
SMART_READER_LITE
LIVE PREVIEW

Modern Systems for Neural Networks Valentin Dalibard This talk - - PowerPoint PPT Presentation

Modern Systems for Neural Networks Valentin Dalibard This talk 1.Practicalities of training Neural Networks 2.Leveraging heterogeneous hardware Source: wikipedia Modern Neural Networks Applications: Image classification Modern Neural


slide-1
SLIDE 1

Modern Systems for Neural Networks

Valentin Dalibard

slide-2
SLIDE 2

This talk

1.Practicalities of training Neural Networks 2.Leveraging heterogeneous hardware

Source: wikipedia

slide-3
SLIDE 3

Modern Neural Networks Applications: Image classification

slide-4
SLIDE 4

Modern Neural Networks Applications: NLP

Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks

Paul Graham generator: “The surprised in investors weren’t going to raise money. I’m not the company with the time there are all interesting quickly, don’t have to get off the same programmers. There’s a super-angel round fundraising, why do you can do. If you have a different physical investment are become in people who reduced in a startup with the way to argument the acquirer could see them just that you’re also the founders will part of users’ affords that and an alternation to the idea. [2] Don’t work at first member to see the way kids will seem in advance of a bad successful startup. And if you have to act the big company too.”

slide-5
SLIDE 5

Modern Neural Networks Applications: Reinforcement Learning

slide-6
SLIDE 6

Training Procedure: Stochastic Gradient Descent

Optimize the weights of the neurons to yield good predictions Use “minibatches” of inputs to estimate the gradient

Source: wikipedia

slide-7
SLIDE 7

Software platforms

Torch (Lua) Theano (Python) Tensorflow (Python/C++) Caffe (C++) Keras Lasagne

slide-8
SLIDE 8

Single Machine Setup:

One or a couple beefy GPUs

slide-9
SLIDE 9

Distribution: Parameter Server Architecture

Source: Dean et al.: Large Scale Distributed Deep Networks

slide-10
SLIDE 10

Trends in software architecture

Fewer bits per floating point Integers rather than floating points

slide-11
SLIDE 11

Optimizing the scheduling on a heterogeneous cluster

Which machines to use as workers? As parameter servers?

↗workers => ↗computational power & ↗communication

How much work to schedule on each worker?

Must load balance

slide-12
SLIDE 12

Ways to do an Optimization

Random Search Genetic algorithm / Simulated annealing Bayesian Optimization No overhead Slight overhead High overhead High #evaluation Medium-high #evaluation Low #evaluation

slide-13
SLIDE 13

Bayesian Optimization

Bayesian Optimization Find parameter values with high performance in the model Evaluate the

  • bjective function

at that point Update the model with this measurement

slide-14
SLIDE 14

Bayesian Optimization

Predicted Performance Parameter Space Probabilistic Model Utility Function Performance

slide-15
SLIDE 15

Structured Bayesian Optimization

Predicted Performance Parameter Space Probabilistic Parameters Utility Function Performance & Runtime properties Probabilistic Program

slide-16
SLIDE 16

Optimizing the scheduling of Neural Networks

Two separate models:

Individual machine model: How fast can a machine process k inputs Network model: How long does it take to transfer the parameters from parameter servers to workers

Iteratively learn the behavior

slide-17
SLIDE 17

Optimizing the scheduling of Neural Networks

slide-18
SLIDE 18

More CPU cores aren’t always better

slide-19
SLIDE 19

Exposing Tradeoff

slide-20
SLIDE 20

Conclusion

Growing demand for Neural networks platforms Can leverage heterogeneous hardware but requires tuning Bayesian Optimization can find good scheduling in a relatively short time