Deep learning Autoencoders Hamid Beigy Sharif university of - PowerPoint PPT Presentation

Deep learning Deep learning Autoencoders Hamid Beigy Sharif university of technology November 11, 2019 Hamid Beigy | Sharif university of technology | November 11, 2019 1 / 32

Deep learning Table of contents 1 Introduction 2 Autoencoders 3 Undercomplete Autoencoder 4 Regularized Autoencoders 5 Denoising Autoencoders 6 Contractive Autoencoder 7 Reading Hamid Beigy | Sharif university of technology | November 11, 2019 2 / 32

Deep learning | Introduction Table of contents 1 Introduction 2 Autoencoders 3 Undercomplete Autoencoder 4 Regularized Autoencoders 5 Denoising Autoencoders 6 Contractive Autoencoder 7 Reading Hamid Beigy | Sharif university of technology | November 11, 2019 2 / 32

Deep learning | Introduction Introduction 1 In previous sessions, we considered deep learning models with the following characteristics. Input Layer: (maybe vectorized), quantitative representation Hidden Layer(s): Apply transformations with nonlinearity Output Layer: Result for classification, regression, translation, segmentation, etc. 2 Models used for supervised learning Hamid Beigy | Sharif university of technology | November 11, 2019 3 / 32

Deep learning | Introduction Introduction 1 In this session, we study unsupervised learning with neural networks. 2 In this setting, we don’t have any label for data samples. Hamid Beigy | Sharif university of technology | November 11, 2019 4 / 32

Deep learning | Autoencoders Table of contents 1 Introduction 2 Autoencoders 3 Undercomplete Autoencoder 4 Regularized Autoencoders 5 Denoising Autoencoders 6 Contractive Autoencoder 7 Reading Hamid Beigy | Sharif university of technology | November 11, 2019 4 / 32

Deep learning | Autoencoders Autoencoders 1 An autoencoder is a feed-forward neural net whose job it is to take an input x and predict x . 2 In another words, autoencoders are neural networks that are trained to copy their inputs to their outputs. 3 It consists of Encoder h = f ( x ) Decoder r = g ( h ) Hamid Beigy | Sharif university of technology | November 11, 2019 5 / 32

Deep learning | Autoencoders Autoencoders 1 Autoencoders consist of an encoder h = f ( x ) taking an input x to the hidden representation h and a decoder ˆ x = g ( x ) mapping the hidden representation h to the input ˆ x . 2 The goal is ∑ x − x ) 2 min (ˆ f , g Hamid Beigy | Sharif university of technology | November 11, 2019 6 / 32

Deep learning | Autoencoders Autoencoder architecture 1 An autoencoder is a data compression algorithm. 2 A hidden layer describes the code used to represent the input. It maps input to output through a compressed representation code. Hamid Beigy | Sharif university of technology | November 11, 2019 7 / 32

Deep learning | Autoencoders Autoencoders 1 PCA can be described as ∑ x − x ) 2 min (ˆ W W T W = I ) 2 ∑ ( W T Wx − x min W Hamid Beigy | Sharif university of technology | November 11, 2019 8 / 32

Deep learning | Autoencoders Autoencoders 1 Autoencoders can be thought of as a non linear PCA. ∑ x − x ) 2 min (ˆ h , g ( g ( f ( x )) − x ) 2 ∑ min h , g Hamid Beigy | Sharif university of technology | November 11, 2019 9 / 32

Deep learning | Autoencoders Autoencoder vs PCA 1 Nonlinear autoencoders can learn more powerful codes for a given dimensionality, compared with linear autoencoders (PCA) Hamid Beigy | Sharif university of technology | November 11, 2019 10 / 32

Deep learning | Autoencoders Autoencoder architecture Output Decoding Encoding Input Hamid Beigy | Sharif university of technology | November 11, 2019 11 / 32

Deep learning | Autoencoders Autoencoder architecture 1 Encoder + Decoder Structure Hamid Beigy | Sharif university of technology | November 11, 2019 12 / 32

Deep learning | Autoencoders Autoencoder architecture 1 Autoencoders are data-specific They are able to compress data similar to what hey have been trained on. 2 This is different from, say, MP3 or JPEG compression algorithm Which make general assumptions about ”sound/images, but not about specific types of sounds/images Autoencoder for pictures of cats would do poorly in compressing pictures of trees Because features it would learn would be cat-specific 3 Autoencoders are lossy This means that the decompressed outputs will be degraded compared to the original inputs (similar to MP3 or JPEG compression). This differs from loss less arithmetic compression Hamid Beigy | Sharif university of technology | November 11, 2019 13 / 32

Deep learning | Autoencoders Stochastic Autoencoders 1 Part of neural network landscape for decades. 2 Traditionally used for dimensionality reduction and feature learning. 3 Modern autoencoders also generalized to stochastic mappings p encoder ( h | x ) = p model ( h | x ) p decoder ( x | h ) = p model ( x | h ) 4 These distributions are called stochastic encoders and decoders, respectively. 5 Recent theoretical connection between autoencoders and latent variable models have brought them into forefront of generative models. Hamid Beigy | Sharif university of technology | November 11, 2019 14 / 32

Deep learning | Autoencoders Distribution View of Autoencoders 1 Consider stochastic decoder g ( h ) as a generative model and its relationship to the joint distribution p model ( x , h ) = p model ( h ) × p model p ( x | h ) log p model ( x , h ) = log p model ( h ) + log p model p ( x | h ) 2 If h is given from encoding network, then we want most likely x to output. 3 Finding MLE of x , h ∼ maximizing p model p ( x , h ). 4 p model ( h ) is prior across latent space values. This term can be regularizing. Hamid Beigy | Sharif university of technology | November 11, 2019 15 / 32

Deep learning | Autoencoders Meaning of Generative 1 By assuming a prior over latent space, can pick values from underlying probability distribution! Hamid Beigy | Sharif university of technology | November 11, 2019 16 / 32

Deep learning | Autoencoders Linear factor models 1 Many of the research frontiers in deep learning involve building a probabilistic model of the input, p model ( x ) 2 Many probabilistic models have latent variables, h , with p model ( x ) = E h [ p model ( x | h )]. 3 Latent variables provide another means of representing the data. 4 The more advanced deep models will extend further latent variables: linear factor models. 5 A linear factor model is defined by the use of a stochastic, linear decoder function that generates x by adding noise to a linear transformation of h . 6 Idea: distributed representations based on latent variables can obtain all of the advantages of learning which we have seen with deep networks Hamid Beigy | Sharif university of technology | November 11, 2019 17 / 32

Deep learning | Autoencoders Autoencoder training using a loss function 1 Encoder f and decoder g . f : X �→ h g : h �→ X ∥ X − ( f ◦ g ) X ∥ 2 argmin f , g 2 One hidden layer Non-linear encoder Takes input x ∈ R d Maps into output h ∈ R p h = σ 1 ( Wx + b ) ˆ x = σ 2 ( W ′ h + b ′ ) Trained to minimize reconstruction error such as x ∥ 2 L ( x , ˆ x ) = ∥ x − ˆ Provides a compressed representation of the input x Hamid Beigy | Sharif university of technology | November 11, 2019 18 / 32

Deep learning | Autoencoders Training autoencoder 1 An autoencoder is a feed-forward non-recurrent neural network. With an input layer, an output layer and one or more hidden layers It can be trained using the following technique Compute gradients using back-propagation Followed by mini-batch gradient descent Hamid Beigy | Sharif university of technology | November 11, 2019 19 / 32

Deep learning | Undercomplete Autoencoder Table of contents 1 Introduction 2 Autoencoders 3 Undercomplete Autoencoder 4 Regularized Autoencoders 5 Denoising Autoencoders 6 Contractive Autoencoder 7 Reading Hamid Beigy | Sharif university of technology | November 11, 2019 19 / 32

Deep learning | Undercomplete Autoencoder Undercomplete Autoencoder 1 An autoencoder whose code dimension is less than the input dimension is called undercomplete. 2 Learning an undercomplete representation forces the autoencoder to capture the most salient features of the training data. 3 The learning process is described simply as minimizing a loss function L ( x , g ( f ( x ))) where L is a loss function penalizing g ( f ( x )) for being dissimilar from x , such as the mean squared error. Hamid Beigy | Sharif university of technology | November 11, 2019 20 / 32

Deep learning | Undercomplete Autoencoder Undercomplete Autoencoder 1 Assume that the autoencoder has only one hidden layer. 2 What is difference between this network and PCA? 3 When the decoder g is linear and L is the means quared error, an undercomplete autoencoder learns to span the same subspace as PCA. 4 In this case the autoencoder trained to perform the copying task has learned the principal subspace of the training data as a side-effect. 5 If the encoder and decoder functions f and g are nonlinear, a more powerful nonlinear generalization of PCA will be obtained. Hamid Beigy | Sharif university of technology | November 11, 2019 21 / 32

Deep learning | Regularized Autoencoders Table of contents 1 Introduction 2 Autoencoders 3 Undercomplete Autoencoder 4 Regularized Autoencoders 5 Denoising Autoencoders 6 Contractive Autoencoder 7 Reading Hamid Beigy | Sharif university of technology | November 11, 2019 21 / 32

Deep learning Autoencoders Hamid Beigy Sharif university of - PowerPoint PPT Presentation

Deep learning Deep learning Autoencoders Hamid Beigy Sharif university of technology November 11, 2019 Hamid Beigy | Sharif university of technology | November 11, 2019 1 / 32 Deep learning Table of contents 1 Introduction 2 Autoencoders 3

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Vocabulary Word #1 fury : (noun) wild or violent anger. In his fury , he could not answer the math

Modern surveys have great promise to uncover a new understanding of cosmic acceleration, but we

Hybrid NLP Hybrid NLP Multilingual HPSG Grammar Engineering Multilingual HPSG Grammar

EXPLAINABLE AI (AND RELATED CONCEPTS) A QUICK TOUR AI Present and future Jacques Fleuriot

iRODS Policies integrated Rule Oriented Data System Reagan Moore {moore, sekar, mwan, schroeder,

Aspects and Modular Reasoning in Nonmonotonic Logic Klaus Ostermann Darmstadt University of

defect detection for the wayward web Andrew J. Ko 01001 10100 10101 software is a fascinating

The middle program $ middle 3 3 5 middle: 3 $ middle 2 1 3 middle: 1 10 10 int main(int

Deep learning Autoencoders Hamid Beigy Sharif university of - PowerPoint PPT Presentation

Deep learning Deep learning Autoencoders Hamid Beigy Sharif university of technology November 11, 2019 Hamid Beigy | Sharif university of technology | November 11, 2019 1 / 32 Deep learning Table of contents 1 Introduction 2 Autoencoders 3

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Vocabulary Word #1 fury : (noun) wild or violent anger. In his fury , he could not answer the math

Modern surveys have great promise to uncover a new understanding of cosmic acceleration, but we

Hybrid NLP Hybrid NLP Multilingual HPSG Grammar Engineering Multilingual HPSG Grammar

EXPLAINABLE AI (AND RELATED CONCEPTS) A QUICK TOUR AI Present and future Jacques Fleuriot

iRODS Policies integrated Rule Oriented Data System Reagan Moore {moore, sekar, mwan, schroeder,

Aspects and Modular Reasoning in Nonmonotonic Logic Klaus Ostermann Darmstadt University of

defect detection for the wayward web Andrew J. Ko 01001 10100 10101 software is a fascinating

The middle program $ middle 3 3 5 middle: 3 $ middle 2 1 3 middle: 1 10 10 int main(int

Deep learning for natural language processing A short primer on deep learning Benoit Favre <