DATA ANALYTICS USING DEEP LEARNING
GT 8803 // FALL 2018 // CHRISTINE HERLIHY
L E C T U R E # 0 8 :
T E N S O R F L O W : A S Y S T E M F O R L A R G E - S C A L E M A C H I N E L E A R N I N G
DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // - - PowerPoint PPT Presentation
DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // CHRISTINE HERLIHY L E C T U R E # 0 8 : T E N S O R F L O W : A S Y S T E M F O R L A R G E - S C A L E M A C H I N E L E A R N I N G TODAYS PAPER TensorFlow: A system
T E N S O R F L O W : A S Y S T E M F O R L A R G E - S C A L E M A C H I N E L E A R N I N G
GT 8803 // Fall 2018
Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek
Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng
2
GT 8803 // Fall 2018
3
GT 8803 // Fall 2018
4
model training algorithms across a range of distributed computing environments
Python)
development
Sources: https://ai.google/research/pubs/pub40565
GT 8803 // Fall 2018
5
Sources: http://www.wolframalpha.com/input/?i=tensor; https://www.tensorflow.org/guide/tensors; https://www.slideshare.net/BertonEarnshaw/a-brief-survey-of-tensors
GT 8803 // Fall 2018
6
GT 8803 // Fall 2018
7
Source: https://www.safaribooksonline.com/library/view/learning-tensorflow/9781491978504/ch01.html
GT 8803 // Fall 2018 8
GT 8803 // Fall 2018
9
Source: http://www.pittnuts.com/2016/08/glossary-in-distributed-tensorflow/
GT 8803 // Fall 2018
10
Source: https://ai.google/research/pubs/pub40565
GT 8803 // Fall 2018
11
Source: https://ai.google/research/pubs/pub40565
GT 8803 // Fall 2018
12
GT 8803 // Fall 2018
and/or parallelization opportunities when available
13
Source: https://ai.google/research/pubs/pub40565
GT 8803 // Fall 2018
14
GT 8803 // Fall 2018
15
Source: https://ai.google/research/pubs/pub45381
GT 8803 // Fall 2018
n*d embedding matrix to produce a dense b*d representation; b << n
worker or store in RAM on a single host
16
Source: https://ai.google/research/pubs/pub45381
GT 8803 // Fall 2018
17
Source: https://ai.google/research/pubs/pub45381
GT 8803 // Fall 2018
18
Source: https://ai.google/research/pubs/pub45381
GT 8803 // Fall 2018
microarchitectures
management 19
Source: https://ai.google/research/pubs/pub45381
GT 8803 // Fall 2018
20
Torch; self-referential benchmarks also established
Single-machine benchmarks Synchronous replica microbenchmark Image classification Language modeling
System performance Could have evaluated on the basis of model learning objectives instead Why choose system performance?
GT 8803 // Fall 2018
TensorFlow to be highly scalable impede performance for small-scale tasks that are essentially kernel- bound
this to the performance gains associated with Neon’s convolutional kernels, which are implemented in assembly
used to train a 4 different CNN models using a single GPU
21
Library AlexNet Overfeat OxfordNet GoogleNet Training step time (ms) Caffe 324 823 1068 1935 Neon 87 211 320 270 Torch 81 268 529 470 TensorFlow 81 279 540 445
Source: https://ai.google/research/pubs/pub45381
GT 8803 // Fall 2018
coordination implementation for synchronous training scales as workers are added to the device pool
steps per second that TF can perform for models of different sizes as the number of synchronous works is increased
parameters from 16 PS tasks, performs trivial computation, and sends updates to the parameter
22
Source: https://ai.google/research/pubs/pub45381
GT 8803 // Fall 2018
Inception-v3 using multiple replicas?
achieved while training the Inception model using asynchronous SGD on TF and Apache MXNet (modern DL framework that uses parameter server architecture)
performance; both TF and MXNet use cuDNN version 5.1 :. results are similar
23
Source: https://ai.google/research/pubs/pub45381
GT 8803 // Fall 2018
workers reduce overall step time?
workers are added, but within diminishing returns due to resulting competition for PS network resources
time; > 4 degrades performance
24
Source: https://ai.google/research/pubs/pub45381
GT 8803 // Fall 2018
neural network that can be used to develop a language model for the text in the One Billion Word Benchmark?
training performance, so they use 40K most common words
and softmax implementations
computation required for PS tasks
25
Sources: https://ai.google/research/pubs/pub45381; http://www.statmt.org/lm-benchmark/
GT 8803 // Fall 2018
26
GT 8803 // Fall 2018
Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS'12), F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), Vol. 1. Curran Associates Inc., USA, 1223-1231.
Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 265-283.
27