Applications Lecture slides for Chapter 12 of Deep Learning - - PowerPoint PPT Presentation

applications
SMART_READER_LITE
LIVE PREVIEW

Applications Lecture slides for Chapter 12 of Deep Learning - - PowerPoint PPT Presentation

Applications Lecture slides for Chapter 12 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2018-10-25 Disclaimer Details of applications change much faster than the underlying conceptual ideas A printed book is updated on the


slide-1
SLIDE 1

Applications

Lecture slides for Chapter 12 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2018-10-25

slide-2
SLIDE 2

(Goodfellow 2018)

Disclaimer

  • Details of applications change much faster than the

underlying conceptual ideas

  • A printed book is updated on the scale of years, state-
  • f-the-art results come out constantly
  • These slides are somewhat more up to date
  • Applications involve much more specific knowledge, the

limitations of my own knowledge will be much more apparent in these slides than others

slide-3
SLIDE 3

(Goodfellow 2018)

Large Scale Deep Learning

1950 1985 2000 2015 2056 10−2 10−1 100 101 102 103 104 105 106 107 108 109 1010 1011 Number of neurons (logarithmic scale) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sponge Roundworm Leech Ant Bee Frog Octopus Human

Figure 1.11

slide-4
SLIDE 4

(Goodfellow 2018)

Fast Implementations

  • CPU
  • Exploit fixed point arithmetic in CPU families where this offers a speedup
  • Cache-friendly implementations
  • GPU
  • High memory bandwidth
  • No cache
  • Warps must be synchronized
  • TPU
  • Similar to GPU in many respects but faster
  • Often requires larger batch size
  • Sometimes requires reduced precision
slide-5
SLIDE 5

(Goodfellow 2018)

Distributed Implementations

  • Distributed
  • Multi-GPU
  • Multi-machine
  • Model parallelism
  • Data parallelism
  • Trivial at test time
  • Synchronous or asynchronous SGD at train time
slide-6
SLIDE 6

(Goodfellow 2018)

Synchronous SGD

TensorFlow tutorial

slide-7
SLIDE 7

(Goodfellow 2018)

Example: ImageNet in 18 minutes for $40

Blog post

slide-8
SLIDE 8

(Goodfellow 2018)

Model Compression

  • Large models often have lower test error
  • Very large model trained with dropout
  • Ensemble of many models
  • Want small model for low resource use at test time
  • Train a small model to mimic the large one
  • Obtains better test error than directly training a small

model

slide-9
SLIDE 9

(Goodfellow 2018)

Quantization

(TensorFlow Lite) Important for mobile deployment

slide-10
SLIDE 10

(Goodfellow 2018)

Dynamic Structure: Cascades

(Viola and Jones, 2001)

slide-11
SLIDE 11

(Goodfellow 2018)

Dynamic Structure

Outrageously Large Neural Networks

slide-12
SLIDE 12

(Goodfellow 2018)

Dataset Augmentation for Computer Vision

Affine Distortion Noise Elastic Deformation Horizontal flip Random Translation Hue Shift

slide-13
SLIDE 13

(Goodfellow 2018)

Training Data Sample Generator (CelebA) (Karras et al, 2017)

Generative Modeling: Sample Generation

Covered in Part III Progressed rapidly after the book was written Underlies many graphics and speech applications

slide-14
SLIDE 14

(Goodfellow 2018)

Graphics

(Table by Augustus Odena)

slide-15
SLIDE 15

(Goodfellow 2018)

Video Generation

(Wang et al, 2018)

slide-16
SLIDE 16

(Goodfellow 2018)

Everybody Dance Now!

(Chan et al 2018)

slide-17
SLIDE 17

(Goodfellow 2018)

Model-Based Optimization

(Killoran et al, 2017)

slide-18
SLIDE 18

(Goodfellow 2018)

Designing Physical Objects

(Hwang et al 2018)

slide-19
SLIDE 19

(Goodfellow 2018)

Attention Mechanisms

α(t−1) α(t−1) α(t) α(t) α(t+1) α(t+1) h(t−1) h(t−1) h(t) h(t) h(t+1) h(t+1) c × × × +

Figure 12.6 Important in many vision, speech, and NLP applications Improved rapidly after the book was written

slide-20
SLIDE 20

(Goodfellow 2018)

Attention for Images

Attention mechanism from Wang et al 2018 Image model from Zhang et al 2018

slide-21
SLIDE 21

(Goodfellow 2018)

Generating Training Data

(Bousmalis et al, 2017)

slide-22
SLIDE 22

(Goodfellow 2018)

Generating Training Data

(Bousmalis et al, 2017)

slide-23
SLIDE 23

(Goodfellow 2018)

Natural Language Processing

  • An important predecessor to deep NLP is the family
  • f models based on n-grams:

P(x1, . . . , xτ) = P(x1, . . . , xn−1)

τ

Y

t=n

P(xt | xt−n+1, . . . , xt−1). (12.5)

P(THE DOG RAN AWAY) = P3(THE DOG RAN)P3(DOG RAN AWAY)/P2(DOG RAN). (12.7)

Improve with:

  • Smoothing
  • Backoff
  • Word categories
slide-24
SLIDE 24

(Goodfellow 2018)

Word Embeddings in Neural Language Models

−34 −32 −30 −28 −26 −14 −13 −12 −11 −10 −9 −8 −7 −6 Canada Europe Ontario North English Canadian Union African Africa British France Russian China Germany French Assembly EU Japan Iraq South European 35.0 35.5 36.0 36.5 37.0 37.5 38.0 17 18 19 20 21 22 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Figure 12.3

slide-25
SLIDE 25

(Goodfellow 2018)

High-Dimensional Output Layers for Large Vocabularies

  • Short list
  • Hierarchical softmax
  • Importance sampling
  • Noise contrastive estimation
slide-26
SLIDE 26

(Goodfellow 2018)

A Hierarchy of Words and Word Categories

(1) (0) (0,0,0) (0,0,1) (0,1,0) (0,1,1) (1,0,0) (1,0,1) (1,1,0) (1,1,1) (1,1) (1,0) (0,1) (0,0) w0 w0 w1 w1 w2 w2 w3 w3 w4 w4 w5 w5 w6 w6 w7 w7

Figure 12.4

slide-27
SLIDE 27

(Goodfellow 2018)

Neural Machine Translation

Decoder Output object (English sentence) Intermediate, semantic representation Source object (French sentence or image) Encoder

Figure 12.5

slide-28
SLIDE 28

(Goodfellow 2018)

Google Neural Machine Translation

Wu et al 2016

slide-29
SLIDE 29

(Goodfellow 2018)

Speech Recognition

Chan et al 2015 “Listen, Attend, and Spell” Graphic from Current speech recognition is based on seq2seq with attention

slide-30
SLIDE 30

(Goodfellow 2018)

Speech Synthesis

WaveNet (van den Oord et al, 2016)

slide-31
SLIDE 31

(Goodfellow 2018)

Deep RL for Atari game playing

(Mnih et al 2013) Convolutional network estimates the value function (future rewards) used to guide the game-playing agent.

(Note: deep RL didn’t really exist when we started the book, became a success while we were writing it, extremely hot topic by the time the book was printed)

slide-32
SLIDE 32

(Goodfellow 2018)

Superhuman Go Performance

(Silver et al, 2016) Monte Carlo tree search, with convolutional networks for value function and policy

slide-33
SLIDE 33

(Goodfellow 2018)

Robotics

(Google Brain)

slide-34
SLIDE 34

(Goodfellow 2018)

Healthcare and Biosciences

(Google Brain)

slide-35
SLIDE 35

(Goodfellow 2018)

Autonomous Vehicles

(WayMo)

slide-36
SLIDE 36

(Goodfellow 2018)

Questions