TensorFlow: a Framework for Scalable Machine Learning ACM Learning - - PowerPoint PPT Presentation

tensorflow a framework for scalable machine learning
SMART_READER_LITE
LIVE PREVIEW

TensorFlow: a Framework for Scalable Machine Learning ACM Learning - - PowerPoint PPT Presentation

TensorFlow: a Framework for Scalable Machine Learning ACM Learning Center, 2016 You probably want to know... What is TensorFlow? Why did we create TensorFlow? How does TensorFlow work? Code: Linear Regression Code: Convolution


slide-1
SLIDE 1

TensorFlow: a Framework for Scalable Machine Learning

ACM Learning Center, 2016

slide-2
SLIDE 2

You probably want to know...

  • What is TensorFlow?
  • Why did we create TensorFlow?
  • How does TensorFlow work?
  • Code: Linear Regression
  • Code: Convolution Deep Neural Network
  • Advanced Topics: Queues and Devices
slide-3
SLIDE 3
  • Fast, flexible, and scalable
  • pen-source machine learning

library

  • One system for research and

production

  • Runs on CPU, GPU, TPU, and

Mobile

  • Apache 2.0 license
slide-4
SLIDE 4

Machine learning gets complex quickly

Modeling complexity

slide-5
SLIDE 5

Machine learning gets complex quickly

Heterogenous System Distributed System

slide-6
SLIDE 6

TensorFlow Handles Complexity

Modeling complexity Heterogenous System Distributed System

slide-7
SLIDE 7

What’s in a Graph?

Edges are Tensors. Nodes are Ops.

  • Constants
  • Variables
  • Computation
  • Debug code (Print, Assert)
  • Control Flow

add a b c

Under the Hood

slide-8
SLIDE 8

A multidimensional array. A graph of operations.

slide-9
SLIDE 9

The TensorFlow Graph

Computation is defined as a graph

  • Graph is defined in high-level language (Python)
  • Graph is compiled and optimized
  • Graph is executed (in parts or fully) on available low

level devices (CPU, GPU, TPU)

  • Nodes represent computations and state
  • Data (tensors) flow along edges
slide-10
SLIDE 10

Build a graph; then run it.

... c = tf.add(a, b) ... session = tf.Session() value_of_c = session.run(c, {a=1, b=2})

add a b c

slide-11
SLIDE 11

Any Computation is a TensorFlow Graph

MatMul Add Relu biases weights examples labels Xent

slide-12
SLIDE 12

Any Computation is a TensorFlow Graph

MatMul Add Relu biases weights examples labels Xent

w i t h s t a t e

variables

slide-13
SLIDE 13

Automatic Differentiation

Xent biases ... grad

Automatically add ops which compute gradients for variables

slide-14
SLIDE 14

Any Computation is a TensorFlow Graph

Simple gradient descent:

Xent Mul biases ... learning rate −= grad

w i t h s t a t e

slide-15
SLIDE 15

Any Computation is a TensorFlow Graph

Device B Device A

distributed

Add Mul biases learning rate −= ... Devices: Processes, Machines, CPUs, GPUs, TPUs, etc ...

slide-16
SLIDE 16

Send and Receive Nodes

Device B Device A Add Mul biases learning rate −= ... ...

distributed

Devices: Processes, Machines, CPUs, GPUs, TPUs, etc

slide-17
SLIDE 17

Send and Receive Nodes

Device A Device B Add Mul biases learning rate −= ...

Send Recv Send Recv Send Recv

...

Recv Send

distributed

Devices: Processes, Machines, CPUs, GPUs, TPUs, etc

slide-18
SLIDE 18

Linear Regression

slide-19
SLIDE 19

Linear Regression

y = Wx + b

input parameters result

slide-20
SLIDE 20

What are we trying to do?

Mystery equation: y = 0.1 * x + 0.3 + noise Model: y = W * x + b Objective: Given enough (x, y) value samples, figure out the value of W and b.

slide-21
SLIDE 21

y = Wx + b in TensorFlow

import tensorflow as tf

slide-22
SLIDE 22

y = Wx + b in TensorFlow

import tensorflow as tf x = tf.placeholder(shape=[None], dtype=tf.float32, name=”x”)

slide-23
SLIDE 23

y = Wx + b in TensorFlow

import tensorflow as tf x = tf.placeholder(shape=[None], dtype=tf.float32, name=”x”) W = tf.get_variable(shape=[], name=”W”)

slide-24
SLIDE 24

y = Wx + b in TensorFlow

import tensorflow as tf x = tf.placeholder(shape=[None], dtype=tf.float32, name=”x”) W = tf.get_variable(shape=[], name=”W”) b = tf.get_variable(shape=[], name=”b”)

slide-25
SLIDE 25

y = Wx + b in TensorFlow

import tensorflow as tf x = tf.placeholder(shape=[None], dtype=tf.float32, name=”x”) W = tf.get_variable(shape=[], name=”W”) b = tf.get_variable(shape=[], name=”b”) y = W * x + b

+

matmul

W b x y

slide-26
SLIDE 26

init_op = tf.initialize_all_variables()

init_op

Variables Must be Initialized

Collects all variable initializers Makes an execution environment Actually initialize the variables

+

matmul

W b x

assign assign initializer initializer

sess = tf.Session() sess.run(init_op)

y

slide-27
SLIDE 27

feed fetch

Running the Computation

+

matmul

W b x y

x_in = 3 sess.run(y, feed_dict={x: x_in})

  • Only what’s used to compute a fetch will

be evaluated

  • All Tensors can be fed, but all

placeholders must be fed

slide-28
SLIDE 28

import tensorflow as tf x = tf.placeholder(shape=[None], dtype=tf.float32, name='x') W = tf.get_variable(shape=[], name='W') b = tf.get_variable(shape=[], name='b') y = W * x + b with tf.Session() as sess: sess.run(tf.initialize_all_variables()) print(sess.run(y, feed_dict={x: x_in}))

Putting it all together

Build the graph Prepare execution environment Initialize variables Run the computation (usually often)

slide-29
SLIDE 29

Define a Loss

Given x, y compute a loss, for instance: # create an operation that calculates loss. loss = tf.reduce_mean(tf.square(y - y_data))

slide-30
SLIDE 30

Minimize loss: optimizers

tf.train.AdadeltaOptimizer tf.train.AdagradOptimizer tf.train.AdagradDAOptimizer tf.train.AdamOptimizer …

error parameters (weights, biases)

function minimum

slide-31
SLIDE 31

Train

Feed (x, ylabel) pairs and adjust W and b to decrease the loss. # Create an optimizer

  • ptimizer = tf.train.GradientDescentOptimizer(0.5)

# Create an operation that minimizes loss. train = optimizer.minimize(loss) W ← W - ( dL/dW ) b ← b - ( dL/db )

TensorFlow computes gradients automatically Learning rate

slide-32
SLIDE 32

loss = tf.reduce_mean(tf.square(y - y_label))

  • ptimizer = tf.train.GradientDescentOptimizer(0.5)

train = optimizer.minimize(loss) with tf.Session() as sess: sess.run(tf.initialize_all_variables()) for i in range(1000): sess.run(train, feed_dict={x: x_in[i], y_label: y_in[i]})

Putting it all together

Define a loss Create an optimizer Op to minimize the loss Iteratively run the training op Initialize variables

slide-33
SLIDE 33

TensorBoard

slide-34
SLIDE 34

Deep Neural Network

slide-35
SLIDE 35
slide-36
SLIDE 36

import tensorflow as tf x = tf.placeholder(shape=[None], dtype=tf.float32, name='x') W = tf.get_variable(shape=[], name='W') b = tf.get_variable(shape=[], name='b') y = W * x + b loss = tf.reduce_mean(tf.square(y - y_label))

  • ptimizer = tf.train.GradientDescentOptimizer(0.5)

train = optimizer.minimize(loss) ...

Remember linear regression?

Build the graph

slide-37
SLIDE 37

Convolutional DNN

x = tf.contrib.layers.conv2d(x, kernel_size=[5,5], ...) x = tf.contrib.layers.max_pool2d(x, kernel_size=[2,2], ...) x = tf.contrib.layers.conv2d(x, kernel_size=[5,5], ...) x = tf.contrib.layers.max_pool2d(x, kernel_size=[2,2], ...) x = tf.contrib.layers.fully_connected(x, activation_fn=tf.nn.relu) x = tf.contrib.layers.dropout(x, 0.5) logits = tf.config.layers.linear(x) fully_connected (linear) dropout 0.5 fully_connected (relu) maxpool 2x2 conv 5x5 (relu) maxpool 2x2 conv 5x5 (relu)

x logits

https://github.com/martinwicke/tensorflow-tutorial/blob/master/2_mnist.ipynb

slide-38
SLIDE 38

Defining Complex Networks

Mul Parameters learning rate −= grad network gradients loss

slide-39
SLIDE 39

Distributed TensorFlow

slide-40
SLIDE 40

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p’ Δp’

slide-41
SLIDE 41

tf.train.ClusterSpec({ "worker": [ "worker0.example.com:2222", "worker1.example.com:2222", "worker2.example.com:2222" ], "ps": [ "ps0.example.com:2222", "ps1.example.com:2222" ]})

Describe a cluster: ClusterSpec

slide-42
SLIDE 42

with tf.device("/job:ps/task:0"): weights_1 = tf.Variable(...) biases_1 = tf.Variable(...) with tf.device("/job:ps/task:1"): weights_2 = tf.Variable(...) biases_2 = tf.Variable(...) with tf.device("/job:worker/task:7"): input, labels = ... layer_1 = tf.nn.relu(tf.matmul(input, weights_1) + biases_1) logits = tf.nn.relu(tf.matmul(layer_1, weights_2) + biases_2) train_op = ... with tf.Session("grpc://worker7.example.com:2222") as sess: for _ in range(10000): sess.run(train_op)

Share the graph across devices

slide-43
SLIDE 43

Input Pipelines with Queues

Filenames Reader Decoder Examples Raw Examples Reader Decoder ... Preprocess Worker Preprocess Preprocess ... Worker ...

slide-44
SLIDE 44

Tutorials on tensorflow.org: Image recognition: https://www.tensorflow.org/tutorials/image_recognition Word embeddings: https://www.tensorflow.org/versions/word2vec Language Modeling: https://www.tensorflow.org/tutorials/recurrent Translation: https://www.tensorflow.org/versions/seq2seq Deep Dream: https://tensorflow.org/code/tensorflow/examples/tutorials/deepdream/deepdream.ipynb

Tutorials & Courses

slide-45
SLIDE 45

Rajat Monga @rajatmonga

Thank you and have fun!

Martin Wicke @martin_wicke

slide-46
SLIDE 46

Extras

slide-47
SLIDE 47

Inception

https://research.googleblog.com/2016/08/improving-inception-and-image.html

An Alaskan Malamute (left) and a Siberian Husky (right). Images from Wikipedia.

slide-48
SLIDE 48

Show and Tell

https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html

slide-49
SLIDE 49

Parsey McParseface

https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html

slide-50
SLIDE 50

Text Summarization

https://research.googleblog.com/2016/08/text-summarization-with-tensorflow.html

Original text

  • Alice and Bob took the train to visit the zoo. They saw a baby giraffe, a

lion, and a flock of colorful tropical birds. Abstractive summary

  • Alice and Bob visited the zoo and saw animals and birds.
slide-51
SLIDE 51
slide-52
SLIDE 52

Claude Monet - Bouquet of Sunflowers Images from the Metropolitan Museum of Art (with permission) Image by @random_forests

slide-53
SLIDE 53
slide-54
SLIDE 54

TensorFlow Distributed Execution System C++ front end Python front end

...

CPU GPU Android iOS ... TPU

Architecture

add mul Print reshape ...

Kernels Ops Bindings + Compound Ops