LEARNING AUTHORS: MARTN ABADI, PAUL BARHAM, JIANMIN CHEN, ZHIFENG - - PowerPoint PPT Presentation

learning
SMART_READER_LITE
LIVE PREVIEW

LEARNING AUTHORS: MARTN ABADI, PAUL BARHAM, JIANMIN CHEN, ZHIFENG - - PowerPoint PPT Presentation

TENSORFLOW: A SYSTEM FOR LARGE-SCALE MACHINE LEARNING AUTHORS: MARTN ABADI, PAUL BARHAM, JIANMIN CHEN, ZHIFENG CHEN, ANDY DAVIS, JEFFREY DEAN, MATTHIEU DEVIN, SANJAY GHEMAWAT, GEOFFREY IRVING, MICHAEL ISARD, MANJUNATH KUDLUR, JOSH LEVENBERG,


slide-1
SLIDE 1

TENSORFLOW: A SYSTEM FOR LARGE-SCALE MACHINE LEARNING

AUTHORS: MARTÍN ABADI, PAUL BARHAM, JIANMIN CHEN, ZHIFENG CHEN, ANDY DAVIS, JEFFREY DEAN, MATTHIEU DEVIN, SANJAY GHEMAWAT, GEOFFREY IRVING, MICHAEL ISARD, MANJUNATH KUDLUR, JOSH LEVENBERG, RAJAT MONGA, SHERRY MOORE, DEREK G. MURRAY, BENOIT STEINER, PAUL TUCKER, VIJAY VASUDEVAN, PETE WARDEN, MARTIN WICKE, YUAN YU, AND XIAOQIANG ZHENG

slide-2
SLIDE 2

OVERVIEW

  • Large Scale ML System
  • Distributed Compute and Training
  • Multi-node
  • Heterogenous Environemnts
  • Dataflow Graphs
  • Open Source
  • Mathematically Flexible
  • Bespoke Loss & Kernels
  • Fault Tolerant
slide-3
SLIDE 3

DATAFLOW GRAPHS

Input 1 Input 2

Multiply

Input 3

Add

Output

Mutability!

slide-4
SLIDE 4

PRIOR WORK

  • DistBelief
  • Architecture
  • Parameter Server
  • Workers
  • Inflexible Layers
  • Inflexible Training Algorithms
  • RNNs, LSTMs, GCNs challenging
  • Optimized for large clusters
  • Caffe & Theano
  • Similar

Parameter Server

Worker Worker Worker

TensorFlow is designed to improve flexibility!

slide-5
SLIDE 5

Input 1 Input 2

Multiply Add

Output

Multiply Add

Input 1 Input 2

Dense

Output

Dense

TensorFlow DistBelief/Keras/Etc

Atomic

slide-6
SLIDE 6

ACCELERATOR ABSTRACTION

GPU TPU CPU

slide-7
SLIDE 7

UNITS OF TENSORFLOW

  • Graph
  • Subgraph
  • Edges
  • Tensors
  • Vertices
  • Operations
  • Automatic Partitioning
  • Subgraphs distributions

maximize compute efficiency

Partitioned subgraphs are distributed to individual compute devices Multidimensional arrays Add, Multiply, Sigmoid

slide-8
SLIDE 8

CONTROL FLOW

  • Graph Partitioned and Distributed
  • Send + Recv Replace Split Edges
  • Send
  • Pushes value from one device to another
  • Recv
  • Blocks until value available
  • “Deferred execution”
  • Synchronous Execution
  • Classically frowned upon
  • GPUs make appealing
  • All workers forced to take same

parameters

  • Backup workers stochastically

eliminate straggling processes

EXECUTION

slide-9
SLIDE 9

DIFFERENTIATION & BACKPROP

  • Symbolic representation
  • Automatically computes backprop code
  • Like PS architectures, enables distributed training via +/- write operations
slide-10
SLIDE 10

IMPLEMENTATION

slide-11
SLIDE 11

SINGLE MACHINE BENCHMARKS

slide-12
SLIDE 12

SPARSE AND DENSE FETCHES FOR SYNC

slide-13
SLIDE 13

CNN IMPLEMENTATIONS

slide-14
SLIDE 14

SYNC AND NON-SYNCED PROCESSES

slide-15
SLIDE 15

TRAINING LARGE MODELS

slide-16
SLIDE 16

CRITICISM

  • No actual accuracy comparisons
  • Convergence comparisons in synchrony analysis?
  • Lacking capability for abstracted computation
  • Reason why Keras runs on top of TF
slide-17
SLIDE 17

CONCLUSION

  • Built a ML system that is:
  • Robust
  • Distributable
  • Extensible
  • Fast
  • In the ensuing years
  • Used extensively
  • Extended
slide-18
SLIDE 18

REFERECES

  • TensorFlow: A System for Large-Scale Machine Learning TensorFlow: A system

for large-scale machine learning. M. Abadi, P. Barham, J. Chen et al. 2016