Large-Scale Deep Learning With TensorFlow Jeff Dean Google Brain - - PowerPoint PPT Presentation

large scale deep learning with tensorflow
SMART_READER_LITE
LIVE PREVIEW

Large-Scale Deep Learning With TensorFlow Jeff Dean Google Brain - - PowerPoint PPT Presentation

Large-Scale Deep Learning With TensorFlow Jeff Dean Google Brain team g.co/brain In collaboration with many other people at Google What is the Google Brain Team? Research team focused on long term artificial intelligence research Mix


slide-1
SLIDE 1

Large-Scale Deep Learning With TensorFlow

Jeff Dean Google Brain team g.co/brain

In collaboration with many other people at Google

slide-2
SLIDE 2

What is the Google Brain Team?

  • Research team focused on long term artificial intelligence

research ○ Mix of computer systems and machine learning research expertise ○ Pure ML research, and research in context of emerging ML application areas: ■ robotics, language understanding, healthcare, ... g.co/brain

slide-3
SLIDE 3

We Disseminate Our Work in Many Ways

  • By publishing our work

○ See papers at research.google.com/pubs/BrainTeam.html

  • By releasing TensorFlow, our core machine learning

research system, as an open-source project

  • By releasing implementations of our research models in

TensorFlow

  • By collaborating with product teams at Google to get our

research into real products

slide-4
SLIDE 4

What Do We Really Want?

  • Build artificial intelligence algorithms and systems that

learn from experience

  • Use those to solve difficult problems that benefit humanity
slide-5
SLIDE 5

What do I mean by understanding?

slide-6
SLIDE 6

What do I mean by understanding?

slide-7
SLIDE 7

What do I mean by understanding?

slide-8
SLIDE 8

What do I mean by understanding?

[ car parts for sale ] Query

slide-9
SLIDE 9

What do I mean by understanding?

[ car parts for sale ] Query Document 1 … car parking available for a small fee. … parts of our floor model inventory for sale. Document 2 Selling all kinds of automobile and pickup truck parts, engines, and transmissions.

slide-10
SLIDE 10

Example Needs of the Future

  • Which of these eye images shows symptoms of diabetic

retinopathy?

  • Find me all rooftops in North America
  • Describe this video in Spanish
  • Find me all documents relevant to reinforcement learning for

robotics and summarize them in German

  • Find a free time for everyone in the Smart Calendar project

to meet and set up a videoconference

  • Robot, please fetch me a cup of tea from the snack kitchen
slide-11
SLIDE 11

Growing Use of Deep Learning at Google

Android Apps drug discovery Gmail Image understanding Maps Natural language understanding Photos Robotics research Speech Translation YouTube … many others ... Across many products/areas:

# of directories containing model description files

slide-12
SLIDE 12

Important Property of Neural Networks

Results get better with more data + bigger models + more computation (Better algorithms, new insights and improved techniques always help, too!)

slide-13
SLIDE 13

Aside

Many of the techniques that are successful now were developed 20-30 years ago What changed? We now have: sufficient computational resources large enough interesting datasets Use of large-scale parallelism lets us look ahead many generations of hardware improvements, as well

slide-14
SLIDE 14

What do you want in a machine learning system?

  • Ease of expression: for lots of crazy ML ideas/algorithms
  • Scalability: can run experiments quickly
  • Portability: can run on wide variety of platforms
  • Reproducibility: easy to share and reproduce research
  • Production readiness: go from research to real products
slide-15
SLIDE 15

Open, standard software for general machine learning Great for Deep Learning in particular First released Nov 2015 Apache 2.0 license http://tensorflow.org/

and

https://github.com/tensorflow/tensorflow

slide-16
SLIDE 16

http://tensorflow.org/whitepaper2015.pdf

slide-17
SLIDE 17

Preprint: arxiv.org/abs/1605.08695 Updated version will appear in OSDI 2016

slide-18
SLIDE 18

Strong External Adoption

GitHub Launch Nov. 2015 GitHub Launch Sep. 2013 GitHub Launch Jan. 2012 GitHub Launch Jan. 2008

50,000+ binary installs in 72 hours, 500,000+ since November, 2015

slide-19
SLIDE 19

Strong External Adoption

GitHub Launch Nov. 2015 GitHub Launch Sep. 2013 GitHub Launch Jan. 2012 GitHub Launch Jan. 2008

50,000+ binary installs in 72 hours, 500,000+ since November, 2015 Most forked new repo on GitHub in 2015 (despite only being available in Nov, ‘15)

slide-20
SLIDE 20

http://tensorflow.org/

slide-21
SLIDE 21

Motivations

  • DistBelief (our 1st system) was the first scalable deep

learning system, but not as flexible as we wanted for research purposes

  • Better understanding of problem space allowed us to

make some dramatic simplifications

  • Define the industrial standard for machine learning
  • Short circuit the MapReduce/Hadoop inefficiency
slide-22
SLIDE 22

TensorFlow: Expressing High-Level ML Computations

  • Core in C++

○ Very low overhead

Core TensorFlow Execution System CPU GPU Android iOS ...

slide-23
SLIDE 23

TensorFlow: Expressing High-Level ML Computations

  • Core in C++

○ Very low overhead

  • Different front ends for specifying/driving the computation

○ Python and C++ today, easy to add more

Core TensorFlow Execution System CPU GPU Android iOS ...

slide-24
SLIDE 24

TensorFlow: Expressing High-Level ML Computations

  • Core in C++

○ Very low overhead

  • Different front ends for specifying/driving the computation

○ Python and C++ today, easy to add more

Core TensorFlow Execution System CPU GPU Android iOS ... C++ front end Python front end

...

slide-25
SLIDE 25

MatMul Add Relu biases weights examples labels Xent

Graph of Nodes, also called Operations or ops.

Computation is a dataflow graph

slide-26
SLIDE 26

w i t h t e n s

  • r

s

MatMul Add Relu biases weights examples labels Xent

Edges are N-dimensional arrays: Tensors

Computation is a dataflow graph

slide-27
SLIDE 27
  • Build a graph computing a neural net inference.

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('MNIST_data', one_hot=True) x = tf.placeholder("float", shape=[None, 784]) W = tf.Variable(tf.zeros([784,10])) b = tf.Variable(tf.zeros([10])) y = tf.nn.softmax(tf.matmul(x, W) + b)

Example TensorFlow fragment

slide-28
SLIDE 28

w i t h s t a t e

Add Mul biases ... learning rate −= ...

'Biases' is a variable −= updates biases Some ops compute gradients

Computation is a dataflow graph

slide-29
SLIDE 29
  • Automatically add ops to calculate symbolic gradients
  • f variables w.r.t. loss function.
  • Apply these gradients with an optimization algorithm

y_ = tf.placeholder(tf.float32, [None, 10]) cross_entropy = -tf.reduce_sum(y_ * tf.log(y))

  • pt = tf.train.GradientDescentOptimizer(0.01)

train_op = opt.minimize(cross_entropy)

Symbolic Differentiation

slide-30
SLIDE 30
  • Launch the graph and run the training ops in a loop

init = tf.initialize_all_variables() sess = tf.Session() sess.run(init) for i in range(1000): batch_xs, batch_ys = mnist.train.next_batch(100) sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

Define graph and then execute it repeatedly

slide-31
SLIDE 31

GPU 0 CPU Add Mul biases learning rate Assign Sub ... ...

d i s t r i b u t e d

Computation is a dataflow graph

slide-32
SLIDE 32

GPU 0 CPU Add Mul biases learning rate Assign Sub ... ...

Assign Devices to Ops

  • TensorFlow inserts Send/Recv Ops to transport tensors across devices
  • Recv ops pull data from Send ops

Send Recv

slide-33
SLIDE 33

GPU 0 CPU Add Mul biases learning rate Assign Sub ... ...

Assign Devices to Ops

  • TensorFlow inserts Send/Recv Ops to transport tensors across devices
  • Recv ops pull data from Send ops

Send Recv Send Recv Send Recv Send Recv

slide-34
SLIDE 34

November 2015

slide-35
SLIDE 35

December 2015

slide-36
SLIDE 36

February 2016

slide-37
SLIDE 37

April 2016

slide-38
SLIDE 38

June 2016

slide-39
SLIDE 39

Activity

slide-40
SLIDE 40

Experiment Turnaround Time and Research Productivity

  • Minutes, Hours:

○ Interactive research! Instant gratification!

  • 1-4 days

○ Tolerable ○ Interactivity replaced by running many experiments in parallel

  • 1-4 weeks

○ High value experiments only ○ Progress stalls

  • >1 month

○ Don’t even try

slide-41
SLIDE 41
slide-42
SLIDE 42

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

slide-43
SLIDE 43

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p

slide-44
SLIDE 44

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p ∆p

slide-45
SLIDE 45

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p ∆p p’ = p + ∆p

slide-46
SLIDE 46

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p’ p’ = p + ∆p

slide-47
SLIDE 47

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p’ ∆p’

slide-48
SLIDE 48

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p’ ∆p’ p’’ = p’ + ∆p

slide-49
SLIDE 49

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p’ ∆p’ p’’ = p’ + ∆p

slide-50
SLIDE 50

Distributed training mechanisms

Graph structure and low-level graph primitives (queues) allow us to play with synchronous vs. asynchronous update algorithms.

slide-51
SLIDE 51

/job:worker/cpu:0 /job:ps/gpu:0 Add Mul biases learning rate Assign Sub ... ...

Cross process communication is the same!

  • Communication across machines over the network abstracted identically to

cross device communication.

Send Recv Send Recv Send Recv Send Recv

No specialized parameter server subsystem!

slide-52
SLIDE 52

Image Model Training Time

Hours 10 GPUs 50 GPUs 1 GPU

slide-53
SLIDE 53

Hours 2.6 hours vs. 79.3 hours (30.5X) 10 GPUs 50 GPUs 1 GPU

Image Model Training Time

slide-54
SLIDE 54

Sync converges faster (time to accuracy)

Synchronous updates (with backup workers) trains to higher accuracy faster Better scaling to more workers (less loss of accuracy) Revisiting Distributed Synchronous SGD, Jianmin Chen, Rajat Monga, Samy Bengio, Raal Jozefowicz, ICLR Workshop 2016, arxiv.org/abs/1604.00981

slide-55
SLIDE 55

Sync converges faster (time to accuracy)

Synchronous updates (with backup workers) trains to higher accuracy faster Better scaling to more workers (less loss of accuracy) Revisiting Distributed Synchronous SGD, Jianmin Chen, Rajat Monga, Samy Bengio, Raal Jozefowicz, ICLR Workshop 2016, arxiv.org/abs/1604.00981 40 hours vs. 50 hours

slide-56
SLIDE 56

General Computations

Although we originally built TensorFlow for our uses around deep neural networks, it’s actually quite flexible Wide variety of machine learning and other kinds of numeric computations easily expressible in the computation graph model

slide-57
SLIDE 57

phones single machines (CPU and/or GPUs) … distributed systems of 100s

  • f machines and/or GPU cards

Runs on Variety of Platforms

custom ML hardware

slide-58
SLIDE 58

Trend: Much More Heterogeneous hardware

General purpose CPU performance scaling has slowed significantly Specialization of hardware for certain workloads will be more important

slide-59
SLIDE 59

Tensor Processing Unit

Custom machine learning ASIC In production use for >16 months: used on every search query, used for AlphaGo match, ...

See Google Cloud Platform blog: Google supercharges machine learning tasks with TPU custom chip, by Norm Jouppi, May, 2016

slide-60
SLIDE 60

Long Short-Term Memory (LSTMs): Make Your Memory Cells Differentiable

[Hochreiter & Schmidhuber, 1997]

M

X Y

M

X Y WRITE? READ? FORGET? W R F

Sigmoids

slide-61
SLIDE 61

Example: LSTM [Hochreiter et al, 1997][Gers et al, 1999]

Enables long term dependencies to flow

slide-62
SLIDE 62

Example: LSTM

for i in range(20): m, c = LSTMCell(x[i], mprev, cprev) mprev = m cprev = c

slide-63
SLIDE 63

Example: Deep LSTM

for i in range(20): for d in range(4): # d is depth input = x[i] if d is 0 else m[d-1] m[d], c[d] = LSTMCell(input, mprev[d], cprev[d]) mprev[d] = m[d] cprev[d] = c[d]

slide-64
SLIDE 64

Example: Deep LSTM

for i in range(20): for d in range(4): # d is depth input = x[i] if d is 0 else m[d-1] m[d], c[d] = LSTMCell(input, mprev[d], cprev[d]) mprev[d] = m[d] cprev[d] = c[d]

slide-65
SLIDE 65

Example: Deep LSTM

for i in range(20): for d in range(4): # d is depth with tf.device("/gpu:%d" % d): input = x[i] if d is 0 else m[d-1] m[d], c[d] = LSTMCell(input, mprev[d], cprev[d]) mprev[d] = m[d] cprev[d] = c[d]

slide-66
SLIDE 66

A B C D _ _ A B C A B C D GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs

slide-67
SLIDE 67

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-68
SLIDE 68

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-69
SLIDE 69

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-70
SLIDE 70

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-71
SLIDE 71

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-72
SLIDE 72

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-73
SLIDE 73

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-74
SLIDE 74

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-75
SLIDE 75

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-76
SLIDE 76

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-77
SLIDE 77

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-78
SLIDE 78

What are some ways that deep learning is having a significant impact at Google?

All of these examples implemented using TensorFlow

  • r our predecessor system
slide-79
SLIDE 79

“How cold is it outside?” Deep Recurrent Neural Network

Acoustic Input Text Output

Reduced word errors by more than 30%

Speech Recognition

Google Research Blog - August 2012, August 2015

slide-80
SLIDE 80

The Inception Architecture (GoogLeNet, 2014)

Going Deeper with Convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich ArXiv 2014, CVPR 2015

slide-81
SLIDE 81

Team Year Place Error (top-5) XRCE (pre-neural-net explosion) 2011 1st 25.8% Supervision (AlexNet) 2012 1st 16.4% Clarifai 2013 1st 11.7% GoogLeNet (Inception) 2014 1st 6.66% Andrej Karpathy (human) 2014 N/A 5.1% BN-Inception (Arxiv) 2015 N/A 4.9% Inception-v3 (Arxiv) 2015 N/A 3.46%

Neural Nets: Rapid Progress in Image Recognition

ImageNet challenge classification task

slide-82
SLIDE 82

“ocean” Deep Convolutional Neural Network

Your Photo Automatic Tag

Search personal photos without tags.

Google Photos Search

Google Research Blog - June 2013

slide-83
SLIDE 83

Google Photos Search

slide-84
SLIDE 84

Reuse same model for completely different problems

Same basic model structure trained on different data, useful in completely different contexts Example: given image → predict interesting pixels

slide-85
SLIDE 85
slide-86
SLIDE 86
slide-87
SLIDE 87

We have tons of vision problems Image search, StreetView, Satellite Imagery, Translation, Robotics, Self-driving Cars,

www.google.com/sunroof

slide-88
SLIDE 88

MEDICAL IMAGING Very good results using similar model for detecting diabetic retinopathy in retinal images

slide-89
SLIDE 89

“Seeing” Go

slide-90
SLIDE 90

Score for doc,query pair

Deep Neural Network

Query & document features

Query: “car parts for sale”, Doc: “Rebuilt transmissions …”

Launched in 2015 Third most important search ranking signal (of 100s)

RankBrain in Google Search Ranking

Bloomberg, Oct 2015: “Google Turning Its Lucrative Web Search Over to AI Machines”

slide-91
SLIDE 91

Sequence-to-Sequence Model

A B C v D __ X Y Z X Y Z Q Input sequence Target sequence

[Sutskever & Vinyals & Le NIPS 2014] Deep LSTM

slide-92
SLIDE 92

Sequence-to-Sequence Model: Machine Translation

v Input sentence Target sentence

[Sutskever & Vinyals & Le NIPS 2014]

How Quelle est taille? votre <EOS>

slide-93
SLIDE 93

Sequence-to-Sequence Model: Machine Translation

v Input sentence Target sentence

[Sutskever & Vinyals & Le NIPS 2014]

How Quelle est taille? votre <EOS> tall How

slide-94
SLIDE 94

Sequence-to-Sequence Model: Machine Translation

v Input sentence Target sentence

[Sutskever & Vinyals & Le NIPS 2014]

How tall are Quelle est taille? votre <EOS> How tall

slide-95
SLIDE 95

Sequence-to-Sequence Model: Machine Translation

v Input sentence Target sentence

[Sutskever & Vinyals & Le NIPS 2014]

How tall you? are Quelle est taille? votre <EOS> How are tall

slide-96
SLIDE 96

Sequence-to-Sequence Model: Machine Translation

v Input sentence

[Sutskever & Vinyals & Le NIPS 2014] At inference time: Beam search to choose most probable

  • ver possible output sequences

Quelle est taille? votre <EOS>

slide-97
SLIDE 97

April 1, 2009: April Fool’s Day joke Nov 5, 2015: Launched Real Product Feb 1, 2016: >10% of mobile Inbox replies

Smart Reply

slide-98
SLIDE 98

Small Feed-Forward Neural Network

Incoming Email Activate Smart Reply?

yes/no

Smart Reply

Google Research Blog

  • Nov 2015
slide-99
SLIDE 99

Small Feed-Forward Neural Network

Incoming Email Activate Smart Reply?

Deep Recurrent Neural Network

Generated Replies

yes/no

Smart Reply

Google Research Blog

  • Nov 2015
slide-100
SLIDE 100

Image Captioning

W __ A

young

girl A

young girl

asleep

[Vinyals et al., CVPR 2015]

slide-101
SLIDE 101

Model: A close up of a child holding a stuffed animal. Human: A young girl asleep on the sofa cuddling a stuffed bear. Model: A baby is asleep next to a teddy bear.

Image Captions Research

slide-102
SLIDE 102
slide-103
SLIDE 103

Combining Vision with Robotics

“Deep Learning for Robots: Learning from Large-Scale Interaction”, Google Research Blog, March, 2016 “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection”, Sergey Levine, Peter Pastor, Alex Krizhevsky, & Deirdre Quillen, Arxiv, arxiv.org/abs/1603.02199

slide-104
SLIDE 104

How Can You Get Started with Machine Learning?

Three ways, with varying complexity: (1) Use a Cloud-based API (Vision, Speech, etc.) (2) Use an existing model architecture, and retrain it or fine tune on your dataset (3) Develop your own machine learning models for new problems More flexible, but more effort required

slide-105
SLIDE 105

Use Cloud-based APIs

cloud.google.com/translate cloud.google.com/speech cloud.google.com/vision cloud.google.com/text

slide-106
SLIDE 106

Use Cloud-based APIs

cloud.google.com/translate cloud.google.com/speech cloud.google.com/vision cloud.google.com/text

slide-107
SLIDE 107

Google Cloud Vision API https://cloud.google.com/vision/

slide-108
SLIDE 108

Google Cloud ML Scaled service for training and inference w/TensorFlow

slide-109
SLIDE 109

A Few TensorFlow Community Examples

(From more than 2000 results for ‘tensorflow’ on GitHub)

  • DQN: github.com/nivwusquorum/tensorflow-deepq
  • NeuralArt: github.com/woodrush/neural-art-tf
  • Char RNN: github.com/sherjilozair/char-rnn-tensorflow
  • Keras ported to TensorFlow: github.com/fchollet/keras
  • Show and Tell: github.com/jazzsaxmafia/show_and_tell.tensorflow
  • Mandarin translation: github.com/jikexueyuanwiki/tensorflow-zh

...

slide-110
SLIDE 110

A Few TensorFlow Community Examples

(From more than 2000 2100 results for ‘tensorflow’ on GitHub)

  • DQN: github.com/nivwusquorum/tensorflow-deepq
  • NeuralArt: github.com/woodrush/neural-art-tf
  • Char RNN: github.com/sherjilozair/char-rnn-tensorflow
  • Keras ported to TensorFlow: github.com/fchollet/keras
  • Show and Tell: github.com/jazzsaxmafia/show_and_tell.tensorflow
  • Mandarin translation: github.com/jikexueyuanwiki/tensorflow-zh

...

slide-111
SLIDE 111

github.com/nivwusquorum/tensorflow-deepq

slide-112
SLIDE 112

github.com/woodrush/neural-art-tf

slide-113
SLIDE 113

github.com/sherjilozair/char-rnn-tensorflow

slide-114
SLIDE 114

github.com/fchollet/keras

slide-115
SLIDE 115

github.com/jazzsaxmafia/show_and_tell.tensorflow

slide-116
SLIDE 116

github.com/jikexueyuanwiki/tensorflow-zh

slide-117
SLIDE 117

What Does the Future Hold?

Deep learning usage will continue to grow and accelerate:

  • Across more and more fields and problems:

○ robotics, self-driving vehicles, ... ○ health care ○ video understanding ○ dialogue systems ○ personal assistance ○ ...

slide-118
SLIDE 118

Conclusions

Deep neural networks are making significant strides in understanding: In speech, vision, language, search, robotics, … If you’re not considering how to use deep neural nets to solve your vision or understanding problems, you almost certainly should be

slide-119
SLIDE 119

Further Reading

  • Dean, et al., Large Scale Distributed Deep Networks, NIPS 2012,

research.google.com/archive/large_deep_networks_nips2012.html.

  • Mikolov, Chen, Corrado & Dean. Efficient Estimation of Word Representations in Vector

Space, NIPS 2013, arxiv.org/abs/1301.3781.

  • Sutskever, Vinyals, & Le, Sequence to Sequence Learning with Neural Networks, NIPS,

2014, arxiv.org/abs/1409.3215.

  • Vinyals, Toshev, Bengio, & Erhan. Show and Tell: A Neural Image Caption Generator.

CVPR 2015. arxiv.org/abs/1411.4555

  • TensorFlow white paper, tensorflow.org/whitepaper2015.pdf (clickable links in bibliography)

g.co/brain (We’re hiring! Also check out Brain Residency program at g.co/brainresidency) www.tensorflow.org research.google.com/people/jeff research.google.com/pubs/BrainTeam.html

Questions?