Deep Learning Jiseob Kim (jkim@bi.snu.ac.kr) Artificial - PowerPoint PPT Presentation

Effect of Unsupervised Pre-Training in DBN (2/2) without pre-training with pre-training 29

Internal Representation of DBN 30

Representation of Higher Layers  Higher layers have more abstract representations  Interpolating between different images is not desirable in lo wer layers, but natural in higher layers Bengio et al., ICML 2013

Inference Algorithm of DBN  As DBN is a generative model, we can also regenerate the data  From the top layer to the bottom, conduct Gibbs sampling to generate th e data samples Occluded Generate data Regenerated Lee, Ng et al., ICML 2009

Applications  Nowadays, CNN outperforms DBN for Image or Speech data  However, if there is no topological information, DBN is still a good choice  Also, if the generative model is needed, DBN is used Generate Face patches Tang, Srivastava, Salakhutdinov, NIPS 2014

CONVOLUTIONAL NEURAL NE TWORKS Slides by Jiseob Kim jkim@bi.snu.ac.kr

Motivation  Idea:  Fully connected structure has too many parameters to learn  Efficient to learn local patterns when there are geometrical, topological structure between features such as image data or voice data (spectrograms)  DBN: different data  CNN: same data Image 1 Image 2

Structure of Convolutional Neural Network (CNN)  Higher features formed by repeated Convolution and Pooling (Subsampling)  Convolution obtains certain Feature from local area  Pooling reduces Dimension, while obtaining Translation- invariant Feature http://parse.ele.tue.nl/education/cluster2

Convolution Layer  The Kernel Detects pattern: 1 0 1 0 1 0 1 0 1  The Resulting value Indicates:  How much the pattern matches at each region

Max-Pooling Layer  The Pooling Layer summarizes the results of Convolution Layer  e.g.) 10x10 result is summarized into 1 cell  The Result of Pooling Layer is Tran slation-invariant

Remarks Higher layer • Higher layer catches more Higher layer specific, abstract patterns • Lower layer catches more general patterns

Parameter Learning of CNN  CNN is just another Neural Network with sparse connections  Learning Algorithm:  Back Propagation on Convolution Layers and Fully-Connected Layers Back Propagation

Applications (Image Classification) (1/2) Image Net Competition Ranking (1000-class, 1 million images) ALL CNN!! From Kyunghyun Cho’s dnn tutorial

Applications (Image Classification) (2/2)  Krizhevsky et al.: the winner of ImageNet 2012 Competition 1000-class problem, Fully Connected top-5 test error rate of 15.3%

Application (Speech Recognition) Convolutional Neural Network Input: CNN outperforms all previous Spectrogram of Speech methods that uses GMM of MFCC

APPENDIX Slides from Wanli Ouyang wlouyang@ee.cuhk.edu.hk

Good learning resources  Webpages:  Geoffrey E. Hinton’s readings (with source code available for DBN) http://ww w.cs.toronto.edu/~hinton/csc2515/deeprefs.html  Notes on Deep Belief Networks http://www.quantumg.net/dbns.php  MLSS Tutorial, October 2010, ANU Canberra, Marcus Frean http://videolectur es.net/mlss2010au_frean_deepbeliefnets/  Deep Learning Tutorials http://deeplearning.net/tutorial/  Hinton’s Tutorial, http://videolectures.net/mlss09uk_hinton_dbn/  Fergus’s Tutorial, http://cs.nyu.edu/~fergus/presentations/nips2013_final.pdf  CUHK MMlab project : http://mmlab.ie.cuhk.edu.hk/project_deep_learning.h tml  People:  Geoffrey E. Hinton’s http://www.cs.toronto.edu/~hinton  Andrew Ng http://www.cs.stanford.edu/people/ang/index.html  Ruslan Salakhutdinov http://www.utstat.toronto.edu/~rsalakhu/  Yee-Whye Teh http://www.gatsby.ucl.ac.uk/~ywteh /  Yoshua Bengio www.iro.umontreal.ca/~bengioy  Yann LeCun http://yann.lecun.com/  Marcus Frean http://ecs.victoria.ac.nz/Main/MarcusFrean  Rob Fergus http://cs.nyu.edu/~fergus/pmwiki/pmwiki.php  Acknowledgement  Many materials in this ppt are from these papers, tutorials, etc (especially Hinton and Frean’s). Sorry for not listing them in full detail. 45 Dumitru Erhan, Aaron Courville, Yoshua Bengio. Understanding Representations Learned in Deep Architectures. Technical Report.

Graphical model for Statistics  Conditional independence b etween random variables  Given C, A and B are indepe C ndent: Smoker?  P(A, B|C) = P(A|C)P(B|C)  P(A,B,C) =P(A, B|C) P(C) B A  = P(A|C)P(B|C) P(C)  Any two nodes are conditio Has Lung cancer Has bronchitis nally independent given the values of their parents. http://www.eecs.qmul.ac.uk/~norman/BBNs/Independence_and_conditional_independence.htm 46

Directed and undirected graphical m odel C  Directed graphical model  P(A,B,C) = P(A|C)P(B|C)P(C) B A  Any two nodes are conditionally indepe ndent given the values of their parents.  Undirected graphical model C  P(A,B,C) = P(B,C)P(A,C)  Also called Marcov Random Field (MRF) B A C C B A B A P(A,B,C,D) = P(D|A,B)P(B|C)P(A|C)P(C) D 47

Modeling undirected model  Probability:    x;     f ( x ; ) f ( x ; ) P ( ) 1   P ( x; )   f ( x ; ) Z ( ) x x partition function Is smoker? Example: P(A,B,C) = P(B,C)P(A,C)    exp( w BC w AC )  C 1 2 P ( A , B , C ; )  exp( w BC w AC ) w 2 w 1 1 2 A , B , C A B exp( w BC ) exp( w AC )  1 2 Is healthy Z ( w , w ) Has Lung cancer 1 2 48

More directed and undirected models A B C y 1 y 2 y 3 D E F h 1 h 2 h 3 G H I Hidden Marcov model MRF in 2D 49

More directed and undirected models A B y 1 y 2 y 3 C h 1 h 2 h 3 D P( y 1 , y 2 , y 3 , h 1 , h 2 , h 3 )=P( h 1 )P( h 2 | h 1 ) P(A,B,C,D)=P(A)P(B)P(C|B)P(D|A,B,C) P( h 3 | h 2 ) P( y 1 | h 1 )P( y 2 | h 2 )P( y 3 | h 3 ) 50

More directed and undirected models x h 3 ... W 2 h 2 ... HMM W 1 ... h 1 h ... ... W W 0 v x ... ... Our d RBM DBN ( (b) (a) 51

Extended reading on graphical model  Zoubin Ghahramani ‘s video lecture on graphical models:  http://videolectures.net/mlss07_ghahramani_grafm/ 52

Product of Experts   f ( x ; )    m m m E ( x ; ) e f ( x ; )     m P ( x ; ) ,        E ( x ; ) f ( x ; ) e Z ( ) m m m x m x      E ( ; ) log f ( ; ) x x m m m m Partition function Energy function       E ( x ; w ) w AB w BC w AD w BE w CF ... 1 2 3 4 3 A B C D E F MRF in 2D G H I 53

Product of Experts   15      T    ( x u ) ( x u ) e c ( 1 ) i i i i  i 1 54

Products of experts versus Mixture model   f ( x ; ) m m m    Products of experts : m P ( x ; )    f ( x ; ) m m m x  "and" operation m  Sharper than mixture  Each expert can constrain a different subset of dimensions.  Mixture model, e.g. Gaussian Mixture model  “or” operation  a weighted sum of many density functions 55

Outline  Basic background on statistical learning and Gr aphical model  Contrastive divergence and Restricte d Boltzmann machine  Product of experts  Contrastive divergence  Restricted Boltzmann Machine  Deep belief net 56

    Z ( ) f ( x; ) Contrastive Divergence (CD) m x      Probability: P ( x; ) f ( x ; ) / Z ( )  Maximum Likelihood and gradient descent     K K        ( k ) ( k )     max P (x ; ) max L ( X ; ) max log P (x ; )          k 1 k 1     L ( X ; ) L ( X ; )       or 0      t 1 t   K 1      (k)   log Z ( ) log f ( x ; )     1 L ( X ; ) K   k 1     K     (k) K log f ( x ; ) 1 log f ( x ; )      p ( x , ) d x     K  k 1     log f ( x ; ) log f ( x ; )        p ( x , ) X model dist. 57 data dist. expectation

P(A,B,C) = P(A|C)P(B|C)P(C) C Contrastive Divergence (CD) B A  Gradient of Likelihood:       (k) K L ( X ; ) log f ( x ; ) 1 log f ( x ; )      p ( x , ) d x       K  k 1 Intractable Easy to compute Fast contrastive divergence Tractable Gibbs Sampling T=1 Sample p ( z 1 , z 2 ,…, z M )   T   L ( X ; )       t 1 t   CD Minimum Accurate but slow gradient Approximate but fast 58 gradient

Gibbs Sampling for graphical model h 1 h 5 h 2 h 3 h 4 x 1 x 2 x 3 More information on Gibbs sampling: Pattern recognition and machine learning(PRML) 59

Convergence of Contrastive divergence (CD)  The fixed points of ML are not fixed points of CD and vice versa.  CD is a biased learning algorithm.  But the bias is typically very small.  CD can be used for getting close to ML solution and then ML le arning can be used for fine-tuning.  It is not clear if CD learning converges (to a stable fixed poi nt). At 2005, proof is not available.  Further theoretical results? Please inform us M. A. Carreira-Perpignan and G. E. Hinton. On Contrastive Divergence Learning. Artificial Intelligence and Statistics, 2005 60

Outline  Basic background on statistical learning and Gr aphical model  Contrastive divergence and Restricte d Boltzmann machine  Product of experts  Contrastive divergence  Restricted Boltzmann Machine  Deep belief net 61

Boltzmann Machine  Undirected graphical model, wit h hidden nodes.   f ( x ; )    m m m E ( x ; ) e f ( x ; )     m P ( x ; ) ,        E ( x ; ) f ( x ; ) e Z ( ) m m m x x m    )     E ( x; w x x x ij i j i i  i j i  w  : { , } ij i Boltzmann machine: E( x,h )= b ' x + c ' h + h ' Wx+x’Ux+h’Vh 62

Boltzmann machine: E( x,h )= b ' x + c ' h + h ' Wx+x’Ux+h’Vh Restricted Boltzmann Machine (RBM)  Undirected, loopy, layer h 1 h 2 h 3 h 4 h 5  E ( x , h ) e  P ( x , h )   E ( x , h ) e x , h   partition E ( x , h ) e x 1 x 2 x 3 function  h P ( x )   E ( x , h ) e x , h  E(x,h)= b ' x+ c ' h+h' W x h   P ( h | x ) P ( h | x ) W i i   P ( x | h ) P ( x | h ) x j j P ( x j = 1 | h ) = σ ( b j +W ’ • j · h ) P ( h i = 1 | x ) = σ ( c i +W i · · x ) Read the manuscript for details

Restricted Boltzmann Machine (RBM)     ( b' x c' h h' Wx) e    f ( x ; )   h P ( x; )     ( b' x c' h h' Wx) e Z ( ) x , h  E (x,h)=b' x+c' h+h' Wx  x = [ x 1 x 2 …] T , h = [ h 1 h 2 …] T  Parameter learning  Maximum Log-Likelihood      K K        ( k ) ( k )     max P (x ; ) min L ( X ; ) min log P (x ; )          k 1 k 1 Geoffrey E. Hinton, “Training Products of Experts by Minimizing Contrastive Divergence.” Neural Computation 14, 1771 – 1800 (2002) 64

CD for RBM  CD for RBM, very fast!   L ( X ; )          ( b' x c' h h' Wx) e     f ( x ; )   t 1 t   h P ( x; )     ( b' x c' h h' Wx) e Z ( ) x , h       (k) K L ( X ; ) log f ( x ; ) 1 log f ( x ; )      p ( x , ) d x      w K  k 1 ij     x h x h x h x h i j i j i j i j   p ( x , ) X 0   x h x h CD i j i j 1 0 P ( x j = 1 |h ) = σ ( b j +W ’ • j · h ) P ( h i = 1 |x ) = σ ( c i +W i · x ) 65

  L ( X ; )   CD for RBM x h x h  i j i j w 1 0 ij P ( x j = 1 |h ) = σ ( b j +W ’ • j · h ) P ( h i = 1 |x ) = σ ( c i +W i · x ) P ( x j = 1 |h ) = σ ( b j +W ’ • j · h ) h 2 h 1 x 1 x 2 P ( x j = 1 |h ) = σ ( b j +W ’ • j · h ) P ( h i = 1 |x ) = σ ( c i +W i · x ) 66

RBM for classification  y : classification label 67 Hugo Larochelle and Yoshua Bengio, Classification using Discriminative Restricted Boltzmann Machines, ICML 2008.

RBM itself has many applications  Multiclass classification  Collaborative filtering  Motion capture modeling  Information retrieval  Modeling natural images  Segmentation Y Li, D Tarlow, R Zemel, Exploring compositional high order pattern potentials for structured output learning, CVPR 2013 V. Mnih, H Larochelle, GE Hinton , Conditional Restricted Boltzmann Machines for Structured Output Prediction, Uncertainty in Artificial Intelligence, 2011. Larochelle, H., & Bengio, Y. (2008). Classification using discriminative restricted boltzmann machines. ICML, 2008. Salakhutdinov, R., Mnih, A., & Hinton, G. E. (2007). Restricted Boltzmann machines for collaborative filtering. ICML 2007. Salakhutdinov, R., & Hinton, G. E. (2009). Replicated softmax: an undirected topic model., NIPS 2009. Osindero, S., & Hinton, G. E. (2008). Modeling image patches with a directed hierarchy of markov random field., NIPS 2008 68

Outline  Basic background on statistical learning and Gr aphical model  Contrastive divergence and Restricted Boltzm ann machine  Deep belief net (DBN)  Why deep leaning?  Learning and inference  Applications 69

Belief Nets  A belief net is a directed acyclic g random hidden raph composed of random varia cause bles. visible effect 70

Deep Belief Net  Belief net that is deep  A generative model  P(x,h 1 ,…,h l ) = p(x|h 1 ) p(h 1 |h 2 )… p(h l -2 |h l -1 ) p(h l -1 ,h l )  Used for unsupervised training of multi-layer deep mo del. … … h 3 … … h 2 … … h 1 … … Pixels=>edges=> local shapes=> object x parts P(x,h 1 ,h 2 ,h 3 ) = p(x|h 1 ) p(h 1 |h 2 ) p(h 2 ,h 3 ) 71

Why Deep learning? Pixels=>edges=> local shapes=> object parts  The mammal brain is organized in a deep architecture wit h a given input percept represented at multiple levels of a bstraction, each level corresponding to a different area of cortex.  An architecture with insufficient depth can require many more computational elements, potentially exponentially more (with respect to input size), than architectures whos e depth is matched to the task.  Since the number of computational elements one can affo rd depends on the number of training examples available to tune or select them, the consequences are not just com putational but also statistical: poor generalization may be expected when using an insufficiently deep architecture f or representing some functions. T. Serre, etc. , “A quantitative theory of immediate visual recognition,” Progress in Brain Research, Computational Neuroscience: Theoretical Insights into Brain Function , vol. 165, pp. 33 – 56, 2007. Yoshua Bengio , “ Learning Deep Architectures for AI, ” Foundations and Trends in Machine Learning , 2009. 72

Why Deep learning?  Linear regression, logistic regression: depth 1  Kernel SVM: depth 2  Decision tree: depth 2  Boosting: depth 2  The basic conclusion that these results suggest is that whe n a function can be compactly represented by a deep archit ecture, it might need a very large architecture to be represe nted by an insufficiently deep one . (Example: logic gates, multi-layer NN with linear threshold units and positive we ight). Yoshua Bengio , “ Learning Deep Architectures for AI, ” Foundations and Trends in Machine Learning , 2009. 73

Example: sum product network (SPN) 2 N-1                  N  2 N-1 parameters      X 1 X 1 X 2 X 4 X 5 X 2 X 3 X 3 X 4 X 5 O(N) parameters 74

Depth of existing approaches  Boosting (2 layers)  L 1: base learner  L 2: vote or linear combination of layer 1  Decision tree, LLE, KNN, Kernel SVM (2 layers)  L 1: matching degree to a set of local templates.  L 2: Combine these degrees  Brain: 5-10 layers    b i K ( x , x ) i i 75

Why decision tree has depth 2?  Rely on partition of input space.  Local estimator. Rely on partition of input space and use separate params for each region. Each r egion is associated with a leaf.  Need as many as training samples as there are v ariations of interest in the target function. Not g ood for highly varying functions.  Num. training sample is exponential to Num. di m in order to achieve a fixed error rate. 76

Deep Belief Net  Inference problem: Infer the states of the unobs erved variables.  Learning problem: Adjust the interactions betw een variables to make the network more likely t o generate the observed data … … h 3 … … h 2 … … h 1 … … x P(x,h 1 ,h 2 ,h 3 ) = p(x|h 1 ) p(h 1 |h 2 ) p(h 2 ,h 3 ) 77

Deep Belief Net  Inference problem (the problem of explaining away): C  P(A,B|C) = P(A|C)P(B|C) B A  P( h 11 , h 12 | x 1 ) ≠ P( h 11 | x 1 ) P( h 12 | x 1 ) h 11 h 12 … … h 1 x 1 … … x An example from manuscript Sol: Complementary prior 78

Deep Belief Net  Inference problem ( the problem of explaining away)  Sol: Complementary prior h 4 30 … … … … h 3 500 … … h 2 1000 … … h 1 2000 … … x Sol: Complementary prior 79

P ( h i = 1 | x) = σ ( c i +W i · x) Deep Belief Net  Explaining away problem of Inference (see the manus cript)  Sol: Complementary prior, see the manuscript  Learning problem  Greedy layer by layer RBM training (optimize lower boun d) and fine tuning  Contrastive divergence for RBM training … … h 3 … … h 3 … … h 2 … … … … h 2 h 2 … … h 1 … … h 1 … … h 1 … … x … … x 80

Deep Belief Net  Why greedy layerwise learning work?  Optimizing a lower bound:   log P ( x ) log P ( x, h ) 1 h     { Q ( h | x )[log P ( h ) log P ( h | x )] Q ( h | x ) log Q ( h | x )]} 1 1 1 1 1 h (1) 1  When we fix parameters for layer 1 an d optimize the parameters for layer 2, … … h 3 we are optimizing the P (h 1 ) in (1) … … h 2 … … h 2 … … h 1 … … h 1 … … x 81

Deep Belief Net and RBM  RBM can be considered as DBN that has infinitive layers … … … x 2 T … … W … … h 1 h 0 W W … … … … x 1 x 0 T W … … h 0 W … … x 0 82

Pretrain, fine-tune and inference – (autoencoder) (BP) 83

Pretrain, fine-tune and inference - 2 y: identity or rotation degree Pretraining Fine-tuning 84

How many layers should we use?  There might be no universally right depth  Bengio suggests that several layers is better than one  Results are robust against changes in the size of a laye r, but top layer should be big  A parameter. Depends on your task.  With enough narrow layers, we can model any distribu tion over binary vectors [1] [1] Sutskever, I. and Hinton, G. E., Deep Narrow Sigmoid Belief Networks are Universal Approximators. Neural Computation, 2007 Copied from http://videolectures.net/mlss09uk_hinton_dbn/ 85

Effect of Unsupervised Pre-training Erhan et. al. AISTATS’2009 86

Effect of Depth without pre-training with pre-training w/o pre-training 87

Why unsupervised pre-training makes sense stuff stuff high low bandwidth bandwidth label label image image If image-label pairs are If image-label pairs were generated this way, it generated this way, it makes sense to first learn would make sense to try to recover the stuff that to go straight from caused the image by images to labels. inverting the high For example, do the bandwidth pathway. pixels have even parity? 88

Beyond layer-wise pretraining  Layer-wise pretraining is efficient but not optimal.  It is possible to train parameters for all layers using a wake -sleep algorithm.  Bottom-up in a layer-wise manner  Top-down and reffiting the earlier models 89

Fine-tuning with a contrastive versio n of the “wake - sleep” algorithm After learning many layers of features, we can fine-tune the f eatures to improve generation. 1. Do a stochastic bottom-up pass  Adjust the top-down weights to be good at reconstructing the fe ature activities in the layer below. 2. Do a few iterations of sampling in the top level RBM -- Adjust the weights in the top-level RBM. 3. Do a stochastic top-down pass  Adjust the bottom-up weights to be good at reconstructing the f eature activities in the layer above. 90

Include lateral connections  RBM has no connection among layers  This can be generalized.  Lateral connections for the first layer [1].  Sampling from P ( h | x ) is still easy. But sampling from p ( x | h ) is more difficult.  Lateral connections at multiple layers [2].  Generate more realistic images.  CD is still applicable, with small modification. [1]B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: a strategy employed by V1?,” Vision Research, vol. 37, pp. 3311 – 3325, December 1997. [2]S. Osindero and G. E. Hinton, “Modeling image patches with a directed hierarchy of Markov random field,” in NIPS, 2007. 91

Without lateral connection 92

With lateral connection 93

My data is real valued …  Make it [0 1] linearly: x = ax + b  Use another distribution 94

My data has temporal dependency …  Static:  Temporal 95

Consider DBN as…  A statistical model that is used for unsupervised traini ng of fully connected deep model  A directed graphical model that is approximated by fa st learning and inference algorithms  A directed graphical model that is fine tuned using ma ture neural network learning approach -- BP. 96

Outline  Basic background on statistical learning and Gr aphical model  Contrastive divergence and Restricted Boltzm ann machine  Deep belief net (DBN)  Why DBN?  Learning and inference  Applications 97

Applications of deep learning  Hand written digits recognition  Dimensionality reduction  Information retrieval  Segmentation  Denoising  Phone recognition  Object recognition  Object detection  … Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks, Science 2006. Welling, M. etc., Exponential Family Harmoniums with an Application to Information Retrieval, NIPS 2004 A. R. Mohamed, etc., Deep Belief Networks for phone recognition, NIPS 09 workshop on deep learning for speech recognition. Nair, V. and Hinton, G. E. 3-D Object recognition with deep belief nets. NIPS09 …………………………. 98

Object recognition  NORB  logistic regression 19.6%, kNN (k=1) 18.4%, Gaussian kern el SVM 11.6%, convolutional neural net 6.0%, convolution al net + SVM hybrid 5.9%. DBN 6.5%.  With the extra unlabeled data (and the same amount of la beled data as before), DBN achieves 5.2%. 99

Learning to extract the orientation of a face p atch (Salakhutdinov & Hinton, NIPS 2007) 100

Deep Learning Jiseob Kim (jkim@bi.snu.ac.kr) Artificial - PowerPoint PPT Presentation

Deep Learning Jiseob Kim (jkim@bi.snu.ac.kr) Artificial Intelligence Class of 2016 Spring Dept. of Computer Science and Engineering Seoul National University 1 History of Neural Network Research Neural network Deep belief net Back

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Joint work with Earl T. Barr, Marc Brockschmidt, Santanu Dash, Mahmoud Khademi Deep

Introduction to English Linguistics 5: Grammar and Syntax II Cognitive Grammar Understands

M =fl Jirka Hana Syntax Chomsky et al. Standard Theory Government and binding (GB)

Learning Deep Structure-Preserving Image-Text Embeddings Liwei Wang Yin Li Svetlana Lazebnik

Practical Methodology for Deploying Machine Learning Ian Goodfellow (An homage to Advice for

On the interplay of network structure and gradient convergence in deep learning Vikas Singh

Deep Neural Nets and Features Sung-Eui Yoon ( ) Course URL:

How to Construct Deep Recurrent Neural Networks AUTHORS: R. PASCANU, C. GULCEHRE, K. CHO, Y.

Deep Learning Jiseob Kim (jkim@bi.snu.ac.kr) Artificial - PowerPoint PPT Presentation

Deep Learning Jiseob Kim (jkim@bi.snu.ac.kr) Artificial Intelligence Class of 2016 Spring Dept. of Computer Science and Engineering Seoul National University 1 History of Neural Network Research Neural network Deep belief net Back

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Joint work with Earl T. Barr, Marc Brockschmidt, Santanu Dash, Mahmoud Khademi Deep

Introduction to English Linguistics 5: Grammar and Syntax II Cognitive Grammar Understands

M =fl Jirka Hana Syntax Chomsky et al. Standard Theory Government and binding (GB)

Learning Deep Structure-Preserving Image-Text Embeddings Liwei Wang Yin Li Svetlana Lazebnik

Practical Methodology for Deploying Machine Learning Ian Goodfellow (An homage to Advice for

On the interplay of network structure and gradient convergence in deep learning Vikas Singh

Deep Neural Nets and Features Sung-Eui Yoon ( ) Course URL:

How to Construct Deep Recurrent Neural Networks AUTHORS: R. PASCANU, C. GULCEHRE, K. CHO, Y.

Deep learning for natural language processing A short primer on deep learning Benoit Favre <