Neural Network Part 5: Unsupervised Models CS 760@UW-Madison Goals - PowerPoint PPT Presentation

Neural Network Part 5: Unsupervised Models CS 760@UW-Madison

Goals for the lecture you should understand the following concepts • autoencoder • restricted Boltzmann machine (RBM) • Nash equilibrium • minimax game • generative adversarial network (GAN) 2

Autoencoder

Autoencoder • Neural networks trained to attempt to copy its input to its output • Contain two parts: • Encoder: map the input to a hidden representation • Decoder: map the hidden representation to the output

Autoencoder ℎ Hidden representation (the code) 𝑦 𝑠 Input Reconstruction

Autoencoder ℎ Encoder 𝑔(⋅) Decoder 𝑕(⋅) 𝑦 𝑠 ℎ = 𝑔 𝑦 , 𝑠 = 𝑕 ℎ = 𝑕(𝑔 𝑦 )

Why want to copy input to output • Not really care about copying • Interesting case: NOT able to copy exactly but strive to do so • Autoencoder forced to select which aspects to preserve and thus hopefully can learn useful properties of the data • Historical note: goes back to (LeCun, 1987; Bourlard and Kamp, 1988; Hinton and Zemel, 1994).

Undercomplete autoencoder • Constrain the code to have smaller dimension than the input • Training: minimize a loss function 𝑀 𝑦, 𝑠 = 𝑀(𝑦, 𝑕 𝑔 𝑦 ) 𝑦 ℎ 𝑠

Undercomplete autoencoder • Constrain the code to have smaller dimension than the input • Training: minimize a loss function 𝑀 𝑦, 𝑠 = 𝑀(𝑦, 𝑕 𝑔 𝑦 ) • Special case: 𝑔, 𝑕 linear, 𝑀 mean square error • Reduces to Principal Component Analysis

Undercomplete autoencoder • What about nonlinear encoder and decoder? • Capacity should not be too large • Suppose given data 𝑦 1 , 𝑦 2 , … , 𝑦 𝑜 • Encoder maps 𝑦 𝑗 to 𝑗 • Decoder maps 𝑗 to 𝑦 𝑗 • One dim ℎ suffices for perfect reconstruction

Regularization • Typically NOT • Keeping the encoder/decoder shallow or • Using small code size • Regularized autoencoders: add regularization term that encourages the model to have other properties • Sparsity of the representation (sparse autoencoder) • Robustness to noise or to missing inputs (denoising autoencoder)

Sparse autoencoder • Constrain the code to have sparsity • Training: minimize a loss function 𝑀 𝑆 = 𝑀(𝑦, 𝑕 𝑔 𝑦 ) + 𝑆(ℎ) 𝑦 ℎ 𝑠

Probabilistic view of regularizing ℎ • Suppose we have a probabilistic model 𝑞(ℎ, 𝑦) • MLE on 𝑦 𝑞(ℎ ′ , 𝑦) log 𝑞(𝑦) = log ෍ ℎ ′ •  Hard to sum over ℎ ′

Probabilistic view of regularizing ℎ • Suppose we have a probabilistic model 𝑞(ℎ, 𝑦) • MLE on 𝑦 𝑞(ℎ ′ , 𝑦) max log 𝑞(𝑦) = max log ෍ ℎ ′ • Approximation: suppose ℎ = 𝑔(𝑦) gives the most likely hidden representation, and σ ℎ ′ 𝑞(ℎ ′ , 𝑦) can be approximated by 𝑞(ℎ, 𝑦)

Probabilistic view of regularizing ℎ • Suppose we have a probabilistic model 𝑞(ℎ, 𝑦) • Approximate MLE on 𝑦, ℎ = 𝑔(𝑦) max log 𝑞(ℎ, 𝑦) = max log 𝑞(𝑦|ℎ) + log 𝑞(ℎ) Loss Regularization

Sparse autoencoder • Constrain the code to have sparsity 𝜇 𝜇 • Laplacian prior: 𝑞 ℎ = 2 exp(− 2 ℎ 1 ) • Training: minimize a loss function 𝑀 𝑆 = 𝑀(𝑦, 𝑕 𝑔 𝑦 ) + 𝜇 ℎ 1

Denoising autoencoder • Traditional autoencoder: encourage to learn 𝑕 𝑔 ⋅ to be identity • Denoising : minimize a loss function 𝑀 𝑦, 𝑠 = 𝑀(𝑦, 𝑕 𝑔 ෤ 𝑦 ) where ෤ 𝑦 is 𝑦 + 𝑜𝑝𝑗𝑡𝑓

Boltzmann Machine

Boltzmann machine • Introduced by Ackley et al. (1985) • General “connectionist” approach to learning arbitrary probability distributions over binary vectors exp(−𝐹 𝑦 ) • Special case of energy model: 𝑞 𝑦 = 𝑎

Boltzmann machine • Energy model: 𝑞 𝑦 = exp(−𝐹 𝑦 ) 𝑎 • Boltzmann machine: special case of energy model with 𝐹 𝑦 = −𝑦 𝑈 𝑉𝑦 − 𝑐 𝑈 𝑦 where 𝑉 is the weight matrix and 𝑐 is the bias parameter

Boltzmann machine with latent variables • Some variables are not observed 𝑦 = 𝑦 𝑤 , 𝑦 ℎ , 𝑦 𝑤 visible, 𝑦 ℎ hidden 𝑈 𝑇𝑦 ℎ − 𝑐 𝑈 𝑦 𝑤 − 𝑑 𝑈 𝑦 ℎ 𝑈 𝑆𝑦 𝑤 − 𝑦 𝑤 𝑈 𝑋𝑦 ℎ − 𝑦 ℎ 𝐹 𝑦 = −𝑦 𝑤 • Universal approximator of probability mass functions

Maximum likelihood • Suppose we are given data 𝑌 = 𝑦 𝑤 1 , 𝑦 𝑤 2 , … , 𝑦 𝑤 𝑜 • Maximum likelihood is to maximize 𝑗 ) log 𝑞 𝑌 = ෍ log 𝑞(𝑦 𝑤 𝑗 where 1 𝑞 𝑦 𝑤 = ෍ 𝑞(𝑦 𝑤 , 𝑦 ℎ ) = ෍ 𝑎 exp(−𝐹(𝑦 𝑤 , 𝑦 ℎ )) 𝑦 ℎ 𝑦 ℎ • 𝑎 = σ exp(−𝐹(𝑦 𝑤 , 𝑦 ℎ )) : partition function, difficult to compute

Restricted Boltzmann machine • Invented under the name harmonium (Smolensky, 1986) • Popularized by Hinton and collaborators to Restricted Boltzmann machine

Restricted Boltzmann machine • Special case of Boltzmann machine with latent variables: 𝑞 𝑤, ℎ = exp(−𝐹 𝑤, ℎ ) 𝑎 where the energy function is 𝐹 𝑤, ℎ = −𝑤 𝑈 𝑋ℎ − 𝑐 𝑈 𝑤 − 𝑑 𝑈 ℎ with the weight matrix 𝑋 and the bias 𝑐, 𝑑 • Partition function 𝑎 = ෍ ෍ exp(−𝐹 𝑤, ℎ ) 𝑤 ℎ

Restricted Boltzmann machine Figure from Deep Learning , Goodfellow, Bengio and Courville

Restricted Boltzmann machine • Conditional distribution is factorial 𝑞 ℎ|𝑤 = 𝑞(𝑤, ℎ) 𝑞(𝑤) = ෑ 𝑞(ℎ 𝑘 |𝑤) 𝑘 and 𝑘 + 𝑤 𝑈 𝑋 𝑞 ℎ 𝑘 = 1|𝑤 = 𝜏 𝑑 :,𝑘 is logistic function

Restricted Boltzmann machine • Similarly, 𝑞 𝑤|ℎ = 𝑞(𝑤, ℎ) 𝑞(ℎ) = ෑ 𝑞(𝑤 𝑗 |ℎ) 𝑗 and 𝑞 𝑤 𝑗 = 1|ℎ = 𝜏 𝑐 𝑗 + 𝑋 𝑗,: ℎ is logistic function

Generative Adversarial Networks (GAN) See Ian Goodfellow’s tutorial slides: http://www.iangoodfellow.com/slides/2018-06-22-gan_tutorial.pdf

THANK YOU Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven, David Page, Jude Shavlik, Tom Mitchell, Nina Balcan, Matt Gormley, Elad Hazan, Tom Dietterich, Pedro Domingos, Geoffrey Hinton, and Ian Goodfellow.

Neural Network Part 5: Unsupervised Models CS 760@UW-Madison Goals - PowerPoint PPT Presentation

Neural Network Part 5: Unsupervised Models CS 760@UW-Madison Goals for the lecture you should understand the following concepts autoencoder restricted Boltzmann machine (RBM) Nash equilibrium minimax game generative

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Neural network applications ALVINN (Pomerleau, mid 1990s) Autonomous Land Vehicle in Neural

Deep Learning Primer Nishith Khandwala Neural Networks Overview Neural Network Basics

Game Theory: Lecture #1 Outline: Sociotechnical systems Social models Game theory

Probabilistic Models of Cognition: Generative models Table of Contents Chapter

Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University Euro- Par12,

Simultaneous embeddings with few bends and crossings Fabrizio Frati Michael Hoffmann Vincent

Computational Complexity Lecture 4 in which Diagonalization takes on itself, and we enter Space

New Results on Romulus T. Iwata, M. Khairallah, K. Minematsu and T. Peyrin NIST LWC 2020 Virtual

Gaseous Tracker R&D Madhu Dixit Carleton University & TRIUMF ILC Detector Test Beam

ECEN 5022 Cryptography Introduction to Information Theory Peter Mathys University of Colorado

Neural Network Part 5: Unsupervised Models CS 760@UW-Madison Goals - PowerPoint PPT Presentation

Neural Network Part 5: Unsupervised Models CS 760@UW-Madison Goals for the lecture you should understand the following concepts autoencoder restricted Boltzmann machine (RBM) Nash equilibrium minimax game generative

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Neural network applications ALVINN (Pomerleau, mid 1990s) Autonomous Land Vehicle in Neural

Deep Learning Primer Nishith Khandwala Neural Networks Overview Neural Network Basics

Game Theory: Lecture #1 Outline: Sociotechnical systems Social models Game theory

Probabilistic Models of Cognition: Generative models Table of Contents Chapter

Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University Euro- Par12,

Simultaneous embeddings with few bends and crossings Fabrizio Frati Michael Hoffmann Vincent

Computational Complexity Lecture 4 in which Diagonalization takes on itself, and we enter Space

New Results on Romulus T. Iwata, M. Khairallah, K. Minematsu and T. Peyrin NIST LWC 2020 Virtual

Gaseous Tracker R&amp;D Madhu Dixit Carleton University &amp; TRIUMF ILC Detector Test Beam

ECEN 5022 Cryptography Introduction to Information Theory Peter Mathys University of Colorado

Gaseous Tracker R&D Madhu Dixit Carleton University & TRIUMF ILC Detector Test Beam