Large-Scale Deep Learning with TensorFlow for Building Intelligent - - PowerPoint PPT Presentation

large scale deep learning with tensorflow for building
SMART_READER_LITE
LIVE PREVIEW

Large-Scale Deep Learning with TensorFlow for Building Intelligent - - PowerPoint PPT Presentation

Large-Scale Deep Learning with TensorFlow for Building Intelligent Systems Jeff Dean Google Brain Team g.co/brain In collaboration with many other people at Google We can now store and perform computation on large datasets, using things like


slide-1
SLIDE 1

Large-Scale Deep Learning with TensorFlow for Building Intelligent Systems

Jeff Dean Google Brain Team g.co/brain

In collaboration with many other people at Google

slide-2
SLIDE 2

We can now store and perform computation on large datasets, using things like MapReduce, BigTable, Spanner, Flume, Pregel, or open- source variants like Hadoop, HBase, Cassandra, Giraph, ... But what we really want is not just raw data, but computer systems that understand this data

slide-3
SLIDE 3

Where are we?

  • Good handle on systems to store and manipulate data
  • What we really care about now is understanding
slide-4
SLIDE 4

What do I mean by understanding?

slide-5
SLIDE 5

What do I mean by understanding?

slide-6
SLIDE 6

What do I mean by understanding?

slide-7
SLIDE 7

What do I mean by understanding?

[ car parts for sale ] Query

slide-8
SLIDE 8

What do I mean by understanding?

[ car parts for sale ] Query Document 1 … car parking available for a small fee. … parts of our floor model inventory for sale. Document 2 Selling all kinds of automobile and pickup truck parts, engines, and transmissions.

slide-9
SLIDE 9

Example Queries of the Future

  • Which of these eye images shows symptoms of diabetic

retinopathy?

  • Find me all rooftops in North America
  • Describe this video in Spanish
  • Find me all documents relevant to reinforcement learning for

robotics and summarize them in German

  • Find a free time for everyone in the Smart Calendar project

to meet and set up a videoconference

slide-10
SLIDE 10

Neural Networks

slide-11
SLIDE 11

“cat”

  • A powerful class of machine learning model
  • Modern reincarnation of artificial neural networks
  • Collection of simple, trainable mathematical functions
  • Compatible with many variants of machine learning

What is Deep Learning?

slide-12
SLIDE 12

“cat”

  • Loosely based on

(what little) we know about the brain

What is Deep Learning?

slide-13
SLIDE 13

Growing Use of Deep Learning at Google

Android Apps drug discovery Gmail Image understanding Maps Natural language understanding Photos Robotics research Speech Translation YouTube … many others ... Across many products/areas:

# of directories containing model description files

slide-14
SLIDE 14

The Neuron

x1 x2 xn

...

w1 w2 wn

...

y

F: a non-linear differentiable function

slide-15
SLIDE 15
slide-16
SLIDE 16

ConvNets

slide-17
SLIDE 17

Learning algorithm

While not done: Pick a random training example “(input, output)” Run neural network on “input” Adjust weights on edges to make output closer to “output”

slide-18
SLIDE 18

Learning algorithm

While not done: Pick a random training example “(input, output)” Run neural network on “input” Adjust weights on edges to make output closer to “output”

slide-19
SLIDE 19

Backpropagation

Use partial derivatives along the paths in the neural net Follow the gradient of the error w.r.t. the connections

Gradient points in direction of improvement

Good description: “Calculus on Computational Graphs: Backpropagation" http://colah.github.io/posts/2015-08-Backprop/

slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22

Non-convexity

  • Low-D => local minima
  • High-D => saddle points
  • Most local minima are close

to the global minima

Slide Credit: Yoshua Bengio

This shows a function of 2 variables: real neural nets are functions of hundreds

  • f millions of variables!
slide-23
SLIDE 23

Plenty of raw data

  • Text: trillions of words of English + other languages
  • Visual data: billions of images and videos
  • Audio: tens of thousands of hours of speech per day
  • User activity: queries, marking messages spam, etc.
  • Knowledge graph: billions of labelled relation triples
  • ...

How can we build systems that truly understand this data?

slide-24
SLIDE 24

Important Property of Neural Networks

Results get better with more data + bigger models + more computation (Better algorithms, new insights and improved techniques always help, too!)

slide-25
SLIDE 25

Aside

Many of the techniques that are successful now were developed 20-30 years ago What changed? We now have: sufficient computational resources large enough interesting datasets Use of large-scale parallelism lets us look ahead many generations of hardware improvements, as well

slide-26
SLIDE 26

What are some ways that deep learning is having a significant impact at Google?

slide-27
SLIDE 27

“How cold is it outside?” Deep Recurrent Neural Network

Acoustic Input Text Output

Reduced word errors by more than 30%

Speech Recognition

Google Research Blog - August 2012, August 2015

slide-28
SLIDE 28

ImageNet Challenge

Given an image, predict one of 1000 different classes

Image credit: www.cs.toronto. edu/~fritz/absps/imagene t.pdf

slide-29
SLIDE 29

The Inception Architecture (GoogLeNet, 2014)

Going Deeper with Convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich ArXiv 2014, CVPR 2015

slide-30
SLIDE 30

Team Year Place Error (top-5) XRCE (pre-neural-net explosion) 2011 1st 25.8% Supervision (AlexNet) 2012 1st 16.4% Clarifai 2013 1st 11.7% GoogLeNet (Inception) 2014 1st 6.66% Andrej Karpathy (human) 2014 N/A 5.1% BN-Inception (Arxiv) 2015 N/A 4.9% Inception-v3 (Arxiv) 2015 N/A 3.46%

Neural Nets: Rapid Progress in Image Recognition

ImageNet challenge classification task

slide-31
SLIDE 31

Good Fine-Grained Classification

slide-32
SLIDE 32

Good Generalization Both recognized as “meal”

slide-33
SLIDE 33

Sensible Errors

slide-34
SLIDE 34

“ocean” Deep Convolutional Neural Network

Your Photo Automatic Tag

Search personal photos without tags.

Google Photos Search

Google Research Blog - June 2013

slide-35
SLIDE 35

Google Photos Search

slide-36
SLIDE 36

Google Photos Search

slide-37
SLIDE 37

“Seeing” Go

Mastering the Game of Go with Deep Neural Networks and Tree Search, Silver et al., Nature, vol. 529 (2016), pp. 484-503

slide-38
SLIDE 38

Reuse same model for completely different problems

Same basic model structure (e.g. given image, predict interesting parts of image) trained on different data, useful in completely different contexts

slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41

We have tons of vision problems Image search, StreetView, Satellite Imagery, Translation, Robotics, Self-driving Cars,

slide-42
SLIDE 42

MEDICAL IMAGING Very good results using similar model for detecting diabetic retinopathy in retinal images

slide-43
SLIDE 43

Language Understanding

[ car parts for sale ] Query Document 1 … car parking available for a small fee. … parts of our floor model inventory for sale. Document 2 Selling all kinds of automobile and pickup truck parts, engines, and transmissions.

slide-44
SLIDE 44

How to deal with Sparse Data?

Usually use many more than 3 dimensions (e.g. 100D, 1000D)

slide-45
SLIDE 45

Embeddings Can be Trained With Backpropagation

Mikolov, Sutskever, Chen, Corrado and Dean. Distributed Representations of Words and Phrases and Their Compositionality, NIPS 2013.

slide-46
SLIDE 46

Nearest Neighbors are Closely Related Semantically Trained language model on Wikipedia

tiger shark bull shark blacktip shark shark

  • ceanic whitetip shark

sandbar shark dusky shark blue shark requiem shark great white shark lemon shark car cars muscle car sports car compact car autocar automobile pickup truck racing car passenger car dealership new york new york city brooklyn long island syracuse manhattan washington bronx yonkers poughkeepsie new york state

* 5.7M docs, 5.4B terms, 155K unique terms, 500-D embeddings

slide-47
SLIDE 47

Directions are Meaningful

Solve analogies with vector arithmetic! V(queen) - V(king) ≈ V(woman) - V(man) V(queen) ≈ V(king) + (V(woman) - V(man))

slide-48
SLIDE 48

Score for doc,query pair

Deep Neural Network

Query & document features

Query: “car parts for sale”, Doc: “Rebuilt transmissions …”

Launched in 2015 Third most important search ranking signal (of 100s)

RankBrain in Google Search Ranking

Bloomberg, Oct 2015: “Google Turning Its Lucrative Web Search Over to AI Machines”

slide-49
SLIDE 49

A Simple Model of Memory

WRITE X, M READ M, Y FORGET M

Instruction Input Output

M

X Y WRITE? READ? FORGET?

slide-50
SLIDE 50

Long Short-Term Memory (LSTMs): Make Your Memory Cells Differentiable

[Hochreiter & Schmidhuber, 1997]

M

X Y

M

X Y WRITE? READ? FORGET? W R F

Sigmoids

slide-51
SLIDE 51

Example: LSTM [Hochreiter et al, 1997][Gers et al, 1999]

Enables long term dependencies to flow

slide-52
SLIDE 52

Sequence-to-Sequence Model

A B C v D __ X Y Z X Y Z Q Input sequence Target sequence

[Sutskever & Vinyals & Le NIPS 2014] Deep LSTM

slide-53
SLIDE 53

Sequence-to-Sequence Model: Machine Translation

v Input sentence Target sentence

[Sutskever & Vinyals & Le NIPS 2014]

How Quelle est taille? votre <EOS>

slide-54
SLIDE 54

Sequence-to-Sequence Model: Machine Translation

v Input sentence Target sentence

[Sutskever & Vinyals & Le NIPS 2014]

How Quelle est taille? votre <EOS> tall How

slide-55
SLIDE 55

Sequence-to-Sequence Model: Machine Translation

v Input sentence Target sentence

[Sutskever & Vinyals & Le NIPS 2014]

How tall are Quelle est taille? votre <EOS> How tall

slide-56
SLIDE 56

Sequence-to-Sequence Model: Machine Translation

v Input sentence Target sentence

[Sutskever & Vinyals & Le NIPS 2014]

How tall you? are Quelle est taille? votre <EOS> How are tall

slide-57
SLIDE 57

Sequence-to-Sequence Model: Machine Translation

v Input sentence

[Sutskever & Vinyals & Le NIPS 2014] At inference time: Beam search to choose most probable

  • ver possible output sequences

Quelle est taille? votre <EOS>

slide-58
SLIDE 58

Sequence-to-Sequence Model: Machine Translation

v Input sentence Target sentence

[Sutskever & Vinyals & Le NIPS 2014]

How tall you? are Quelle est taille? votre <EOS>

slide-59
SLIDE 59

Sequence-to-Sequence Model: Machine Translation

v Input sentence Target sentence

[Sutskever & Vinyals & Le NIPS 2014]

Word w2 w4 w3 <EOS>

slide-60
SLIDE 60
slide-61
SLIDE 61
slide-62
SLIDE 62

April 1, 2009: April Fool’s Day joke Nov 5, 2015: Launched Real Product Feb 1, 2016: >10% of mobile Inbox replies

Smart Reply

slide-63
SLIDE 63

Small Feed- Forward Neural Network

Incoming Email Activate Smart Reply?

yes/no

Smart Reply

Google Research Blog

  • Nov 2015
slide-64
SLIDE 64

Small Feed- Forward Neural Network

Incoming Email Activate Smart Reply?

Deep Recurrent Neural Network

Generated Replies

yes/no

Smart Reply

Google Research Blog

  • Nov 2015
slide-65
SLIDE 65

Sequence-to-Sequence

  • Translation: [Kalchbrenner et al., EMNLP 2013][Cho et al., EMLP 2014][Sutskever & Vinyals & Le, NIPS

2014][Luong et al., ACL 2015][Bahdanau et al., ICLR 2015]

  • Image captions: [Mao et al., ICLR 2015][Vinyals et al., CVPR 2015][Donahue et al., CVPR 2015][Xu et al.,

ICML 2015]

  • Speech: [Chorowsky et al., NIPS DL 2014][Chan et al., arxiv 2015]
  • Language Understanding: [Vinyals & Kaiser et al., NIPS 2015][Kiros et al., NIPS 2015]
  • Dialogue: [Shang et al., ACL 2015][Sordoni et al., NAACL 2015][Vinyals & Le, ICML DL 2015]
  • Video Generation: [Srivastava et al., ICML 2015]
  • Algorithms: [Zaremba & Sutskever, arxiv 2014][Vinyals & Fortunato & Jaitly, NIPS 2015][Kaiser &

Sutskever, arxiv 2015][Zaremba et al., arxiv 2015]

slide-66
SLIDE 66

Image Captioning

W __ A

young

girl A

young girl

asleep

[Vinyals et al., CVPR 2015]

slide-67
SLIDE 67

Model: A close up of a child holding a stuffed animal. Human: A young girl asleep on the sofa cuddling a stuffed bear. Model: A baby is asleep next to a teddy bear.

Image Captioning

slide-68
SLIDE 68
slide-69
SLIDE 69

Combined Vision + Translation

slide-70
SLIDE 70

Turnaround Time and Effect on Research

  • Minutes, Hours:

○ Interactive research! Instant gratification!

  • 1-4 days

○ Tolerable ○ Interactivity replaced by running many experiments in parallel

  • 1-4 weeks:

○ High value experiments only ○ Progress stalls

  • >1 month

○ Don’t even try

slide-71
SLIDE 71

Train in a day what would take a single GPU card 6 weeks

slide-72
SLIDE 72

How Can We Train Large, Powerful Models Quickly?

  • Exploit many kinds of parallelism

○ Model parallelism ○ Data parallelism

slide-73
SLIDE 73

Model Parallelism

slide-74
SLIDE 74

Model Parallelism

slide-75
SLIDE 75

Model Parallelism

slide-76
SLIDE 76

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

slide-77
SLIDE 77

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p

slide-78
SLIDE 78

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p ∆p

slide-79
SLIDE 79

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p ∆p p’ = p + ∆p

slide-80
SLIDE 80

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p’ p’ = p + ∆p

slide-81
SLIDE 81

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p’ ∆p’

slide-82
SLIDE 82

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p’ ∆p’ p’’ = p’ + ∆p

slide-83
SLIDE 83

Data Parallelism

Parameter Servers

...

Model Replicas Data

...

p’ ∆p’ p’’ = p’ + ∆p

slide-84
SLIDE 84

Data Parallelism Choices

Can do this synchronously:

  • N replicas equivalent to an N times larger batch size
  • Pro: No noise
  • Con: Less fault tolerant (requires some recovery if any single machine fails)

Can do this asynchronously:

  • Con: Noise in gradients
  • Pro: Relatively fault tolerant (failure in model replica doesn’t block other

replicas) (Or hybrid: M asynchronous groups of N synchronous replicas)

slide-85
SLIDE 85

Image Model Training Time

Hours 10 GPUs 50 GPUs 1 GPU

slide-86
SLIDE 86

Hours 2.6 hours vs. 79.3 hours (30.5X) 10 GPUs 50 GPUs 1 GPU

Image Model Training Time

slide-87
SLIDE 87

What do you want in a machine learning system?

  • Ease of expression: for lots of crazy ML ideas/algorithms
  • Scalability: can run experiments quickly
  • Portability: can run on wide variety of platforms
  • Reproducibility: easy to share and reproduce research
  • Production readiness: go from research to real products
slide-88
SLIDE 88

Open, standard software for general machine learning Great for Deep Learning in particular First released Nov 2015 Apache 2.0 license http://tensorflow.org/

and

https://github.com/tensorflow/tensorflow

slide-89
SLIDE 89

http://tensorflow.org/whitepaper2015.pdf

slide-90
SLIDE 90

Strong External Adoption

GitHub Launch Nov. 2015 GitHub Launch Sep. 2013 GitHub Launch Jan. 2012 GitHub Launch Jan. 2008

50,000+ binary installs in 72 hours, 500,000+ since November, 2015

slide-91
SLIDE 91

Strong External Adoption

GitHub Launch Nov. 2015 GitHub Launch Sep. 2013 GitHub Launch Jan. 2012 GitHub Launch Jan. 2008

50,000+ binary installs in 72 hours, 500,000+ since November, 2015 Most forked repository on GitHub in 2015 (despite only being available in Nov, ‘15)

slide-92
SLIDE 92

http://tensorflow.org/

slide-93
SLIDE 93
slide-94
SLIDE 94

Motivations

DistBelief (1st system) was great for scalability, and production training of basic kinds of models Not as flexible as we wanted for research purposes Better understanding of problem space allowed us to make some dramatic simplifications

slide-95
SLIDE 95

TensorFlow: Expressing High-Level ML Computations

  • Core in C++

○ Very low overhead

Core TensorFlow Execution System CPU GPU Android iOS ...

slide-96
SLIDE 96

TensorFlow: Expressing High-Level ML Computations

  • Core in C++

○ Very low overhead

  • Different front ends for specifying/driving the computation

○ Python and C++ today, easy to add more

Core TensorFlow Execution System CPU GPU Android iOS ...

slide-97
SLIDE 97

TensorFlow: Expressing High-Level ML Computations

  • Core in C++

○ Very low overhead

  • Different front ends for specifying/driving the computation

○ Python and C++ today, easy to add more

Core TensorFlow Execution System CPU GPU Android iOS ... C++ front end Python front end

...

slide-98
SLIDE 98

MatMul Add Relu biases weights examples labels Xent

Graph of Nodes, also called Operations or ops.

Computation is a dataflow graph

slide-99
SLIDE 99

w i t h t e n s

  • r

s

MatMul Add Relu biases weights examples labels Xent

Edges are N-dimensional arrays: Tensors

Computation is a dataflow graph

slide-100
SLIDE 100

w i t h s t a t e

Add Mul biases ... learning rate −= ...

'Biases' is a variable −= updates biases Some ops compute gradients

Computation is a dataflow graph

slide-101
SLIDE 101

Device A Device B

d i s t r i b u t e d

Add Mul biases ... learning rate −= ... Devices: Processes, Machines, GPUs, etc

Computation is a dataflow graph

slide-102
SLIDE 102

Automatically runs models on range of platforms: from phones ... to single machines (CPU and/or GPUs) … to distributed systems of many 100s of GPU cards

TensorFlow: Expressing High-Level ML Computations

slide-103
SLIDE 103

Trend: Much More Heterogeneous hardware

General purpose CPU performance scaling has slowed significantly Specialization of hardware for certain workloads will be more important

slide-104
SLIDE 104

Tensor Processing Unit

Custom machine learning ASIC In production use for >14 months: used on every search query, used for AlphaGo match, ...

slide-105
SLIDE 105

Using TensorFlow for Parallelism

Trivial to express both model parallelism as well as data parallelism

  • Very minimal changes to single device model code
slide-106
SLIDE 106

Example: LSTM

for i in range(20): m, c = LSTMCell(x[i], mprev, cprev) mprev = m cprev = c

slide-107
SLIDE 107

Example: Deep LSTM

for i in range(20): for d in range(4): # d is depth input = x[i] if d is 0 else m[d-1] m[d], c[d] = LSTMCell(input, mprev[d], cprev[d]) mprev[d] = m[d] cprev[d] = c[d]

slide-108
SLIDE 108

Example: Deep LSTM

for i in range(20): for d in range(4): # d is depth input = x[i] if d is 0 else m[d-1] m[d], c[d] = LSTMCell(input, mprev[d], cprev[d]) mprev[d] = m[d] cprev[d] = c[d]

slide-109
SLIDE 109

Example: Deep LSTM

for i in range(20): for d in range(4): # d is depth with tf.device("/gpu:%d" % d): input = x[i] if d is 0 else m[d-1] m[d], c[d] = LSTMCell(input, mprev[d], cprev[d]) mprev[d] = m[d] cprev[d] = c[d]

slide-110
SLIDE 110

A B C D _ _ A B C A B C D GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs

slide-111
SLIDE 111

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-112
SLIDE 112

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-113
SLIDE 113

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-114
SLIDE 114

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-115
SLIDE 115

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-116
SLIDE 116

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-117
SLIDE 117

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-118
SLIDE 118

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-119
SLIDE 119

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-120
SLIDE 120

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-121
SLIDE 121

A B C D _ _ A B C A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence GPU1 GPU2 GPU3 GPU4 A B C D GPU5 GPU6

slide-122
SLIDE 122

ML: unsupervised learning reinforcement learning highly multi-task and transfer learning automatic learning of model structures privacy preserving techniques in ML …

Interesting Open Problems

slide-123
SLIDE 123

Interesting Open Problems

Systems: Use high level descriptions of ML computations and map these efficiently onto wide variety of different hardware Integration of ML into more traditional data processing systems Automated splitting of computations across mobile devices and datacenters Use learning in lieu of traditional heuristics in systems ...

slide-124
SLIDE 124

What Does the Future Hold?

Deep learning usage will continue to grow and accelerate:

  • Across more and more fields and problems:

○ robotics, self-driving vehicles, ... ○ health care ○ video understanding ○ dialogue systems ○ personal assistance ○ ...

slide-125
SLIDE 125

Combining Vision with Robotics

“Deep Learning for Robots: Learning from Large-Scale Interaction”, Google Research Blog, March, 2016

“Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection”, Sergey Levine, Peter Pastor, Alex Krizhevsky, & Deirdre Quillen, arxiv.

  • rg/abs/1603.02199
slide-126
SLIDE 126

Conclusions

Deep neural networks are making significant strides in understanding: In speech, vision, language, search, … If you’re not considering how to apply deep neural nets to your data, you almost certainly should be TensorFlow makes it easy for everyone to experiment with these techniques

  • Highly scalable design allows faster experiments, accelerates research
  • Easy to share models and to publish code to give reproducible results
  • Ability to go from research to production within same system
slide-127
SLIDE 127

Further Reading

  • Dean, et al., Large Scale Distributed Deep Networks, NIPS 2012, research.google.

com/archive/large_deep_networks_nips2012.html.

  • Mikolov, Chen, Corrado & Dean. Efficient Estimation of Word Representations in Vector

Space, NIPS 2013, arxiv.org/abs/1301.3781.

  • Sutskever, Vinyals, & Le, Sequence to Sequence Learning with Neural Networks, NIPS,

2014, arxiv.org/abs/1409.3215.

  • Vinyals, Toshev, Bengio, & Erhan. Show and Tell: A Neural Image Caption Generator.

CVPR 2015. arxiv.org/abs/1411.4555

  • TensorFlow white paper, tensorflow.org/whitepaper2015.pdf (clickable links in bibliography)

g.co/brain (We’re hiring! Also check out Brain Residency program at g.co/brainresidency) research.google.com/people/jeff research.google.com/pubs/BrainTeam.html

Questions?