Deep Learning: Review & Discussion Chiyuan Zhang, CSAIL, CBMM - PowerPoint PPT Presentation

Deep Learning: Review & Discussion Chiyuan Zhang, CSAIL, CBMM 2015.07.22

Overview • What has been done? • Applications • Main Challenges • Empirical Analysis • Theoretical Analysis • What is to be done?

Deep Learning What has been done?

Applications • Computer Vision • ConvNets, dominating • Speech Recognition • Deep Nets, Recurrent Neural Networks (RNNs), dominating, industrial deployment • Natural Language Processing • Matched previous state-of-the-art, but no revolutionized results yet • Reinforcement Learning, Structured Prediction, Graphical Models, Unsupervised Learning, … • “Unrolling” iteration as NN layer

Image Classification • Imagenet Large Scale Visual Recognition Challenge (ILSVRC)   http://image-net.org/challenges/LSVRC/ • Tasks • Classification: 1000-way multiclass learning • Detection: classify and locate (bounding box) • State-of-the-art • ConvNets since 2012 Olga Russakovsky, . . ., Andrej Karpathy and Li Fei-Fei et.al. • ImageNet Large Scale Visual Recognition Challenge. arXiv: 1409.0575 [cs.CV].

Surpassing “Human Level” Performance • Try it yourself: http://cs.stanford.edu/people/karpathy/ilsvrc/ • For human • Difficult & painful task (1000 classes) • One guy trained himself with 500 images and tested on 1500 (!!) images • ~ 1 minute to classify 1 images: ~ 25 hours… • ~ 5% error, the so-called “human level” performance • Human and machines are making different kinds of errors, for details see   http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

• Models pre-trained on Imagenet turns out to be very good feature extractor or initialization model for many • Typically takes ~1 week to train on a descent GPU node • Imagenet challenge training set ~1.2M images (p > N) • e.g. google “Inception”, 27 layers, ~7M parameters; VGG ~100M parameters (table 2, arXiv:1409.1556). ConvNets on ImageNet softmax2 other vision related tasks even on different datasets; popular in both academia and industry (startups) SoftmaxActivation FC AveragePool 7x7+1(V) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat Conv Conv Conv Conv softmax1 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool SoftmaxActivation 1x1+1(S) 1x1+1(S) 3x3+1(S) MaxPool FC 3x3+2(S) DepthConcat FC Conv Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) 1x1+1(S) Conv Conv MaxPool AveragePool 1x1+1(S) 1x1+1(S) 3x3+1(S) 5x5+3(V) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat softmax0 Conv Conv Conv Conv SoftmaxActivation 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool FC 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat FC Conv Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) 1x1+1(S) Conv Conv MaxPool AveragePool 1x1+1(S) 1x1+1(S) 3x3+1(S) 5x5+3(V) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) MaxPool 3x3+2(S) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) MaxPool 3x3+2(S) LocalRespNorm Conv 3x3+1(S) Conv 1x1+1(V) LocalRespNorm MaxPool 3x3+2(S) Conv 7x7+2(S) input

Andrej Karpathy and Li Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions. CVPR 2015. Kelvin Xu et. al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015. Remi Lebret et. al. Phrase-based Image Captioning. ICML 2015. … Fancier applications Image Captioning

Zheng, Shuai et al. “Conditional Random Fields as Recurrent Neural Networks.” arXiv.org cs.CV (2015).

Unrolling Iterative Algorithms as Layers of Deep Nets • Zheng, Shuai et al. “Conditional Random Fields as Recurrent Neural Networks.” arXiv.org cs.CV (2015).

Unrolling Multiplicative NMF Iterations Jonathan Le Roux et. al. Deep NMF for Speech Separation. ICASSP 2015.

Speech Recognition • RNNs: Non-fixed-length input, using context / memory for current prediction • Very deep neural network when unfolded in time, hard to train Image source: Li Deng and Dong Yu. Deep Learning – Methods and Applications. Realtime conversation translation

Video Pinball Boxing Breakout Star Gunner Robotank Reinforcement Learning & more Atlantis Crazy Climber Gopher Demon Attack Name This Game Krull Assault Road Runner Kangaroo James Bond Tennis Pong Space Invaders Beam Rider Tutankham Kung-Fu Master Freeway Time Pilot Enduro Fishing Derby Up and Down Ice Hockey Q*bert At human-level or above H.E.R.O. Asterix Below human-level Battle Zone Wizard of Wor Chopper Command Centipede Bank Heist River Raid Zaxxon Amidar Alien Venture Seaquest Double Dunk • Google Deep Mind. Human-level control through deep Bowling Ms. Pac-Man reinforcement learning. Nature, Feb. 2015. Asteroids Frostbite Gravitar DQN Private Eye Best linear learner • Google Deep Mind. Neural Turing Machines. ArXiv 2014. Montezuma's Revenge 0 100 200 300 400 500 600 1,000 4,500%

Deep Learning What are the challenges?

Convergence of Optimization • Gradients diminishing, lower layers hard to train • ReLU, empirically faster convergence • Gradients explode or diminish • Clever initialization (preserve variance / scale in each layer) • Xaiver and variants : Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. ArXiv 2015. • Identity : Q. V. Le, N. Jaitly, G. E. Hinton. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. ArXiv 2015. • Memory gates: LSTM, Highway Networks (Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber. Highway Networks. ArXiv 2015), etc. • Batch normalization: Sergey Ioffe, Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML 2015. • Many more tricks out there…

Regularization Overfitting problems do exist in deep learning • “Baidu Overfitting Imagenet”: http://www.image-net.org/challenges/LSVRC/announcement-June-2-2015 • Data augmentation commonly used in • computer vision (random translation, rotation, cropping, mirroring…) • speech recognition • e.g. Andrew Y. Ng et. al. Deep Speech: Scaling up end-to-end speech recognition. ArXiv 2015. 100,1000 hours (~11 years) of augmented speech data

Regularization Overfitting problems do exist in deep learning • Dropout • Intuition: forced to be robust; model averaging. • Justification • Wager, Stefan, Sida Wang, and Percy S Liang. “Dropout Training as Adaptive Regularization.” NIPS 2013. • David McAllester. A PAC-Bayesian Tutorial MNIST TIMIT with A Dropout Bound. ArXiv 2013. figure source: http://winsty.net/talks/dropout.pptx • Variations: DropConnect, DropLabel…

Regularization Overfitting problems do exist in deep learning • (Structured) sparsity comes into play • Computer vision: ConvNets — sparse connection with weight sharing • Speech recognition: RNNs — time index correspondence, weight sharing • Unrolling: structure from algorithms • Behnam Neyshabur, Ryota Tomioka, Nathan Srebro. Norm-Based Capacity Control in Neural Networks. COLT 2015. • Q : is the sparsity pattern learnable?

Computation • Hashing • e.g. K.Q. Weinberger et. al. Compressing Neural Networks with the Hashing Trick. ICML 2015. • Limited numerical precision computing with stochastic rounding • Suyog Gupta et. al. Deep Learning with Limited Numerical Precision. ICML 2015.

Deep Learning Existing Empirical Analysis

Network Visualization • Visualizing the learned filters • Visualizing high-response input images • Adversarial images • Reconstruction (what kind of information is perserved) Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012.

Matthew D. Zeiler, Rob Fergus. Visualizing and Understanding Convolutional Networks. ECCV 2014. the top 9 activations in a random subset of feature maps across the validation data, projected down to pixel space using our deconvolutional network approach.

Adversarial images for a trained CNN (or any classifier) 1st column: • original images. 2nd column: • perturbations. 3rd column: • perturbed images, all classified as “ostrich, Struthio camelus”. Christian Szegedy, …, Rob Fergus. Intriguing properties of neural networks. ICLR 2014.

Anh Nguyen, Jason Yosinski, Jeff Clune. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. CVPR 2015. http://www.evolvingai.org/ fooling see also Supernormal Stimuli for human and animals: https://imgur.com/ a/ibMUn

Reconstruction from each layer of a CNN • Aravindh Mahendran, Andrea Vedaldi. Understanding Deep Image Representations by Inverting Them. CVPR 2015. • Jonathan Long, Ning Zhang, Trevor Darrell. Do Convnets Learn Correspondence? NIPS 2014.

Deep Learning: Review & Discussion Chiyuan Zhang, CSAIL, CBMM - PowerPoint PPT Presentation

Deep Learning: Review & Discussion Chiyuan Zhang, CSAIL, CBMM 2015.07.22 Overview What has been done? Applications Main Challenges Empirical Analysis Theoretical Analysis What is to be done? Deep Learning What has

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Beach Guide for Dogs and Their Owners 2 3L www.thecornishcoast.co.uk 4 7 9 5 8 6 10 Dogs

Web security model Deian Stefan Some slides adopted from Nadia Heninger, Zakir Durumeric, Dan

Stanford CS193p Developing Applications for iOS Winter 2017 CS193p Winter 2017 Today

Wayfinding Robert W. Lindeman Worcester Polytechnic Institute Department of Computer Science

The Keynesian multiplier, news and fiscal policy rules in a DSGE model Authors: George Perendia

Introduction to Survival Analysis Kan Ren Apex Data and Knowledge Management Lab Shanghai Jiao

Time-dependent covariates Rasmus Waagepetersen November 17, 2020 1 / 11 Martingale approach to

Parametric Survival Analysis So far, we have focused primarily on nonparametric and

Deep Learning: Review & Discussion Chiyuan Zhang, CSAIL, CBMM - PowerPoint PPT Presentation

Deep Learning: Review & Discussion Chiyuan Zhang, CSAIL, CBMM 2015.07.22 Overview What has been done? Applications Main Challenges Empirical Analysis Theoretical Analysis What is to be done? Deep Learning What has

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Beach Guide for Dogs and Their Owners 2 3L www.thecornishcoast.co.uk 4 7 9 5 8 6 10 Dogs

Web security model Deian Stefan Some slides adopted from Nadia Heninger, Zakir Durumeric, Dan

Stanford CS193p Developing Applications for iOS Winter 2017 CS193p Winter 2017 Today

Wayfinding Robert W. Lindeman Worcester Polytechnic Institute Department of Computer Science

The Keynesian multiplier, news and fiscal policy rules in a DSGE model Authors: George Perendia

Introduction to Survival Analysis Kan Ren Apex Data and Knowledge Management Lab Shanghai Jiao

Time-dependent covariates Rasmus Waagepetersen November 17, 2020 1 / 11 Martingale approach to

Parametric Survival Analysis So far, we have focused primarily on nonparametric and

Deep learning for natural language processing A short primer on deep learning Benoit Favre <