TensorFlow Flexible, Scalable, Portable Rajat Monga Engineering - - PowerPoint PPT Presentation

tensorflow flexible scalable portable
SMART_READER_LITE
LIVE PREVIEW

TensorFlow Flexible, Scalable, Portable Rajat Monga Engineering - - PowerPoint PPT Presentation

TensorFlow Flexible, Scalable, Portable Rajat Monga Engineering Director, TensorFlow Released in Nov. 2015 #1 repository for machine learning category on GitHub Some Stats 10,000+ commits since Nov, 2015 450+ contributors 1M+


slide-1
SLIDE 1

TensorFlow Flexible, Scalable, Portable

Rajat Monga Engineering Director, TensorFlow

slide-2
SLIDE 2
slide-3
SLIDE 3

repository

for “machine learning” category on GitHub

#1

Released in Nov. 2015

slide-4
SLIDE 4

10,000+ commits since Nov, 2015 450+ contributors 1M+ binary downloads #20 most popular repository on GitHub by stars Used in ML classes at many universities: Toronto, Berkeley, Stanford, ...

Some Stats

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

TensorFlow powered Cucumber Sorter

From: http://workpiles.com/2016/02/tensorflow-cnn-cucumber/

slide-8
SLIDE 8

Flexible

slide-9
SLIDE 9
slide-10
SLIDE 10

MatMul Add Relu biases weights examples labels Xent

Directed graph

slide-11
SLIDE 11

Async SGD with Parameter Server

Model Workers Parameter Server p ∆p p’ = p - λ *∆p

slide-12
SLIDE 12

Async SGD with TensorFlow

Model Workers PS Workers p ∆p p’ = p - λ *∆p

slide-13
SLIDE 13

Sync SGD with TensorFlow

Model Workers PS Workers p ∆p delta = ∑∆p p’ = p - λ * delta

slide-14
SLIDE 14

Alternate version of Parameter sharing

Workers

slide-15
SLIDE 15

More ML Algorithms

  • Clustering e.g. k-means
  • Random Forests
  • Logistic Regression
  • Bayesian methods
slide-16
SLIDE 16

Scalable

slide-17
SLIDE 17

Deferred Execution

Operations build the dataflow graph; eval() fetches the result. import tensorflow as tf with tf.Session(): x = tf.constant([[5, 6], [7, 8]]) z = tf.matmul(x, x) + tf.matmul(x, [[1, 0], [0, 1]]) # Run graph to fetch z. result = z.eval()

MatMul const x const MatMul +

slide-18
SLIDE 18

Parallelism in Op implementations

MatMul ... ...

slide-19
SLIDE 19

... ... ... ... ... MatMul MatMul

Task Parallelism in DataFlow graph

slide-20
SLIDE 20

Data Parallelism

MatMul Input MatMul Param ... ... MatMul MatMul ... ...

slide-21
SLIDE 21

Model Parallelism

MatMul Matrix Matrix MatMul Split Concat

slide-22
SLIDE 22

GPU 0 CPU Add Mul biases learning rate −= ... ...

Distribution across Devices

slide-23
SLIDE 23

GPU 0 CPU Add Mul biases learning rate −= ... ...

Distribution

  • TensorFlow inserts Send/Recv Ops to transport tensors across devices
  • Recv ops pull data from Send ops

Send Recv

slide-24
SLIDE 24

GPU 0 CPU Add Mul biases learning rate −= ... ...

Distribution

  • TensorFlow inserts Send/Recv Ops to transport tensors across devices
  • Recv ops pull data from Send ops

Send Recv Send Recv Send Recv Send Recv

slide-25
SLIDE 25

Scale across machines

slide-26
SLIDE 26

Portable

slide-27
SLIDE 27

Platforms

slide-28
SLIDE 28

TensorFlow Core Execution Engine CPU GPU Android iOS ...

Device Abstraction

slide-29
SLIDE 29

Languages

slide-30
SLIDE 30

TensorFlow Core Execution Engine CPU GPU Android iOS ...

Language Abstraction

C++ Python ...

slide-31
SLIDE 31

Now what?

slide-32
SLIDE 32
slide-33
SLIDE 33

AutoTrash

slide-34
SLIDE 34

AutoTrash

slide-35
SLIDE 35

Rajat Monga

@rajatmonga

Thank You!