Applications Lecture slides for Chapter 12 of Deep Learning - PowerPoint PPT Presentation

Applications Lecture slides for Chapter 12 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2018-10-25

Disclaimer • Details of applications change much faster than the underlying conceptual ideas • A printed book is updated on the scale of years, state- of-the-art results come out constantly • These slides are somewhat more up to date • Applications involve much more specific knowledge, the limitations of my own knowledge will be much more apparent in these slides than others (Goodfellow 2018)

Large Scale Deep Learning Number of neurons (logarithmic scale) 10 11 Human 10 10 17 20 10 9 19 16 Octopus 18 10 8 14 10 7 Frog 11 8 10 6 Bee 3 10 5 Ant 10 4 10 3 Leech 13 10 2 2 12 10 1 1 Roundworm 15 6 9 10 0 10 5 10 − 1 7 4 10 − 2 Sponge 1950 1985 2000 2015 2056 Figure 1.11 (Goodfellow 2018)

Fast Implementations • CPU • Exploit fixed point arithmetic in CPU families where this o ff ers a speedup • Cache-friendly implementations • GPU • High memory bandwidth • No cache • Warps must be synchronized • TPU • Similar to GPU in many respects but faster • Often requires larger batch size • Sometimes requires reduced precision (Goodfellow 2018)

Distributed Implementations • Distributed • Multi-GPU • Multi-machine • Model parallelism • Data parallelism • Trivial at test time • Synchronous or asynchronous SGD at train time (Goodfellow 2018)

Synchronous SGD TensorFlow tutorial (Goodfellow 2018)

Example: ImageNet in 18 minutes for $40 Blog post (Goodfellow 2018)

Model Compression • Large models often have lower test error • Very large model trained with dropout • Ensemble of many models • Want small model for low resource use at test time • Train a small model to mimic the large one • Obtains better test error than directly training a small model (Goodfellow 2018)

Quantization Important for mobile deployment (TensorFlow Lite) (Goodfellow 2018)

Dynamic Structure: Cascades (Viola and Jones, 2001) (Goodfellow 2018)

Dynamic Structure Outrageously Large Neural Networks (Goodfellow 2018)

Dataset Augmentation for Computer Vision Elastic A ffi ne Noise Deformation Distortion Random Horizontal Translation Hue Shift flip (Goodfellow 2018)

Generative Modeling: Sample Generation Training Data Sample Generator (CelebA) (Karras et al, 2017) Progressed rapidly Covered in Part III after the book was Underlies many written graphics and speech applications (Goodfellow 2018)

Graphics (Table by Augustus Odena) (Goodfellow 2018)

Video Generation (Wang et al, 2018) (Goodfellow 2018)

Everybody Dance Now! (Chan et al 2018) (Goodfellow 2018)

Model-Based Optimization (Killoran et al, 2017) (Goodfellow 2018)

Designing Physical Objects (Hwang et al 2018) (Goodfellow 2018)

Attention Mechanisms c + α ( t − 1) α ( t − 1) α ( t ) α ( t ) α ( t +1) α ( t +1) × × × h ( t − 1) h ( t − 1) h ( t ) h ( t ) h ( t +1) h ( t +1) Figure 12.6 Important in many vision, speech, and NLP applications Improved rapidly after the book was written (Goodfellow 2018)

Attention for Images Attention mechanism from Wang et al 2018 Image model from Zhang et al 2018 (Goodfellow 2018)

Generating Training Data (Bousmalis et al, 2017) (Goodfellow 2018)

Natural Language Processing • An important predecessor to deep NLP is the family of models based on n -grams: τ Y P ( x 1 , . . . , x τ ) = P ( x 1 , . . . , x n − 1 ) P ( x t | x t − n +1 , . . . , x t − 1 ) . (12.5) t = n P ( THE DOG RAN AWAY ) = P 3 ( THE DOG RAN ) P 3 ( DOG RAN AWAY ) /P 2 ( DOG RAN ) . (12.7) Improve with: -Smoothing -Backo ff -Word categories (Goodfellow 2018)

Word Embeddings in Neural Language Models − 6 22 France − 7 China Russian 21 2009 − 8 2008 French English − 9 2004 20 2007 2003 2001 2006 − 10 Germany Iraq 2000 Ontario 19 2005 − 11 1999 Europe EU Japan Union Africa African Assembly 2002 1995 − 12 1996 1997 1998 European 18 British North − 13 Canada Canadian South − 14 17 − 34 − 32 − 30 − 28 − 26 35 . 0 35 . 5 36 . 0 36 . 5 37 . 0 37 . 5 38 . 0 Figure 12.3 (Goodfellow 2018)

High-Dimensional Output Layers for Large Vocabularies • Short list • Hierarchical softmax • Importance sampling • Noise contrastive estimation (Goodfellow 2018)

A Hierarchy of Words and Word Categories (0) (1) (0,0) (0,1) (1,0) (1,1) w 0 w 0 w 1 w 1 w 2 w 2 w 3 w 3 w 4 w 4 w 5 w 5 w 6 w 6 w 7 w 7 (0,0,0) (0,0,1) (0,1,0) (0,1,1) (1,0,0) (1,0,1) (1,1,0) (1,1,1) Figure 12.4 (Goodfellow 2018)

Neural Machine Translation Output object (English sentence) Decoder Intermediate, semantic representation Encoder Source object (French sentence or image) Figure 12.5 (Goodfellow 2018)

Google Neural Machine Translation Wu et al 2016 (Goodfellow 2018)

Speech Recognition Current speech recognition is based on seq2seq with attention Graphic from “Listen, Attend, and Spell” Chan et al 2015 (Goodfellow 2018)

Speech Synthesis WaveNet (van den Oord et al, 2016) (Goodfellow 2018)

Deep RL for Atari game playing (Mnih et al 2013) Convolutional network estimates the value function (future rewards) used to guide the game-playing agent. (Note: deep RL didn’t really exist when we started the book, became a success while we were writing it, extremely hot topic by the time the book was printed) (Goodfellow 2018)

Superhuman Go Performance Monte Carlo tree search, with convolutional networks for value function and policy (Silver et al, 2016) (Goodfellow 2018)

Robotics (Google Brain) (Goodfellow 2018)

Healthcare and Biosciences (Google Brain) (Goodfellow 2018)

Autonomous Vehicles (WayMo) (Goodfellow 2018)

Questions (Goodfellow 2018)

Applications Lecture slides for Chapter 12 of Deep Learning - PowerPoint PPT Presentation

Applications Lecture slides for Chapter 12 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2018-10-25 Disclaimer Details of applications change much faster than the underlying conceptual ideas A printed book is updated on the

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Tcp/Ip Applications Programming for Os/2: With Applications for Presentation Manager Tcp/Ip

Multimedia Applications Multimedia Applications Srinidhi Varadarajan Multimedia Applications

Network Applications Network Applications There are many network applications Network

Customer Data Privacy in Customer Data Privacy in AMI Applications AMI Applications AMI

Modular Applications, Loose Coupling, and the NetBeans Lookup API The Need for Modular

Sponsored by: Sponsored by: OR 680: Applications Seminar OR 680: Applications Seminar OR 680:

Vadim Lozin DIMAP Center for Discrete Mathematics and its Applications Mathematics Institute

CO550 Web Applications UNIT 11 Wider Context of Web Applications, Progressive Web Apps,

BLOCKCHAIN Technology & Applications #apiconf2018 BLOCKCHAIN Technology & Applications

New Directions for Web Applications Dave Raggett, Canon, TV Raman, IBM 1/11 Web Applications

AI Planner Applications Practical Applications of AI Planners Overview Deep Space 1

Reacting Flow Applications in STAR-CCM+ Outline Various Applications Overview of available

Presentation Technical results Applications Product Info Main Flex-Auger applications PIGS

FY 2018/2019 Application Cycle Application Rating Program Applications Applications # of

Interpretable and Accurate Fine-grained Recognition via Region Grouping Zixuan Huang 1 , Yin Li

Video Paragraph Captioning using Hierarchical Recurrent Neural Networks Haonan Yu, Jiang Wang,

Sequence-to-Sequence Models Can Directly Translate Foreign Speech Ron J. Weiss, Jan Chorowski ,

GANocracy Outline Background: Text Generation Latent-Variable Generation Learning

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

Turning Your Attention to VISTA Member Retention Dial: 877-853-5257 Webinar ID: 996-1208-0047 1

Prioritizing Attention in Fast Data Principles and Promise Peter Bailis Edward Gan Kexin Rong

Attention, Coordination, and Bounded Recall Alessandro Pavan Northwestern University Chicago

Applications Lecture slides for Chapter 12 of Deep Learning - PowerPoint PPT Presentation

Applications Lecture slides for Chapter 12 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2018-10-25 Disclaimer Details of applications change much faster than the underlying conceptual ideas A printed book is updated on the

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Tcp/Ip Applications Programming for Os/2: With Applications for Presentation Manager Tcp/Ip

Multimedia Applications Multimedia Applications Srinidhi Varadarajan Multimedia Applications

Network Applications Network Applications There are many network applications Network

Customer Data Privacy in Customer Data Privacy in AMI Applications AMI Applications AMI

Modular Applications, Loose Coupling, and the NetBeans Lookup API The Need for Modular

Sponsored by: Sponsored by: OR 680: Applications Seminar OR 680: Applications Seminar OR 680:

Vadim Lozin DIMAP Center for Discrete Mathematics and its Applications Mathematics Institute

CO550 Web Applications UNIT 11 Wider Context of Web Applications, Progressive Web Apps,

BLOCKCHAIN Technology &amp; Applications #apiconf2018 BLOCKCHAIN Technology &amp; Applications

New Directions for Web Applications Dave Raggett, Canon, TV Raman, IBM 1/11 Web Applications

AI Planner Applications Practical Applications of AI Planners Overview Deep Space 1

Reacting Flow Applications in STAR-CCM+ Outline Various Applications Overview of available

Presentation Technical results Applications Product Info Main Flex-Auger applications PIGS

FY 2018/2019 Application Cycle Application Rating Program Applications Applications # of

Interpretable and Accurate Fine-grained Recognition via Region Grouping Zixuan Huang 1 , Yin Li

Video Paragraph Captioning using Hierarchical Recurrent Neural Networks Haonan Yu, Jiang Wang,

Sequence-to-Sequence Models Can Directly Translate Foreign Speech Ron J. Weiss, Jan Chorowski ,

GANocracy Outline Background: Text Generation Latent-Variable Generation Learning

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

Turning Your Attention to VISTA Member Retention Dial: 877-853-5257 Webinar ID: 996-1208-0047 1

Prioritizing Attention in Fast Data Principles and Promise Peter Bailis Edward Gan Kexin Rong

Attention, Coordination, and Bounded Recall Alessandro Pavan Northwestern University Chicago

BLOCKCHAIN Technology & Applications #apiconf2018 BLOCKCHAIN Technology & Applications