Parallelized Training of Deep NN Comparison of Current Concepts and - - PowerPoint PPT Presentation

parallelized training of deep nn
SMART_READER_LITE
LIVE PREVIEW

Parallelized Training of Deep NN Comparison of Current Concepts and - - PowerPoint PPT Presentation

Parallelized Training of Deep NN Comparison of Current Concepts and Frameworks Sebastian Jger, Hans-Peter Zorn, Stefan Igel, Christian Zirpins Rennes, Dec 10, 2018 Motivation Need to scale the training of neural networks horizontally


slide-1
SLIDE 1

Parallelized Training of Deep NN

Comparison of Current Concepts and Frameworks

Sebastian Jäger, Hans-Peter Zorn, Stefan Igel, Christian Zirpins Rennes, Dec 10, 2018

slide-2
SLIDE 2

› Need to scale the training of neural networks horizontally › Kubernetes based technology stack › Scalability of concepts and frameworks

2

Motivation

slide-3
SLIDE 3

3

Distributed Training Methods

Data Parallelism

slide-4
SLIDE 4

4

Data Parallelism

Centralized Parameter Server

TensorFlow: https://www.tensorflow.org

slide-5
SLIDE 5

5

Data Parallelism

Decentralized Parameter Server

Apache MXNet: http://mxnet.apache.org

slide-6
SLIDE 6

› Google Kubernetes Engine › CPU: 2.6 GHz › Ubuntu 16.04 › TensorFlow 1.8.0 › MXNet 1.3.0

6

Experimental Setup

Environment

slide-7
SLIDE 7

7

Experimental Setup

Networks Convolutional NN › LeNet-5

› 5 layer › 10 classes

› Fashion MNIST

› 28x28 gray-scale

Recurrent NN › LSTM

› 2 layer › 200 units

› Penn Tree Bank

› 1.000.000 words

slide-8
SLIDE 8

8

Experimental Setup

Metrics

slide-9
SLIDE 9

9

Results

Convolutional Neural Network

slide-10
SLIDE 10

10

Results

Convolutional Neural Network

slide-11
SLIDE 11

11

Results

Recurrent Neural Network

slide-12
SLIDE 12

Decentralized Parameter Server ... › more robust regarding increasing communication effort › scales better for small NN For bigger/ more complex NN … › no significant difference between concepts

12

Summarizating the Experiments

slide-13
SLIDE 13

› for small NN better scalability and throughput › for bigger NN higher throughput › less and less complicated code › easier to scale up training

13

Conclusion

MXNet ...

slide-14
SLIDE 14

Thank you

Sebastian Jäger @se_jaeger inovex GmbH Ludwig-Erhard-Allee 6 76131 Karlsruhe sebastian.jaeger@inovex.de