On the Fine-Tuning Parameters in Deep Boltzmann Machines Using - PowerPoint PPT Presentation

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu On the Fine-Tuning Parameters in Deep Boltzmann Machines Using Quaternions Jo˜ ao Paulo Papa papa@fc.unesp.br March 28, 2016 1 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines 1 Harmony Search 2 Quaternions 3 Methodology and Experiments 4 Conclusions and Future Works 5 2 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Talk Outline Restricted Boltzmann Machines 1 Harmony Search 2 Quaternions 3 Methodology and Experiments 4 Conclusions and Future Works 5 3 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts RBMs are probabilistic models composed by two layers: visible v ∈ { 0 , 1 } m (input) and hidden h ∈ { 0 , 1 } n , which are connected by a weight matrix W m × n . Additionally, we have bias units attached to each visible and hidden layer. b 1 b 2 ... b n h h 2 h n ... h 1 W w ij v v v v v ... 1 2 3 m a m a 1 a 2 a 3 ... 4 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts The Energy of an RBM is given by: m n m n � � � � E ( v , h ) = − v i h j w ij − a i v i − b j h j , (1) i =1 j =1 i =1 j =1 being the probability of a given configuration ( v , h ) computed as follows: P ( v , h ) = e − E ( v , h ) , (2) Z where Z is the so-called normalizing constant/partition function . Such value is given by: � e − E ( v , h ) . Z = (3) v , h 5 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts The probability of a data point v (visible layer) is defined as follows: h e − E ( v , h ) � � P ( v ) = P ( v , h ) = . (4) Z h Let V = { v 1 , v 2 , . . . , v M } be a training set: in short, the RBM training algorithm aims at decreasing the energy of each training sample v k ∈ ℜ m in order to increase its probability. E E training step v v 6 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts The training data likelihood (using just one training point for sake of simplicity), is given by: φ = log P ( v ) = φ + − φ − , (5) where φ + = log � e − E ( v , h ) (6) h and φ − = log Z = log � e − E ( v , h ) . (7) v , h Now, the question is: how can we train an RBM? 7 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts Basically, the training step aims at updating W in order to maximize the log-likelihood of the training data until a certain convergence criterion is met (usually the number of iterations/epochs). Usually, it is employed the stochastic gradient descent for such purpose, i.e.: � ∂φ + ∂ W − ∂φ − � W t +1 → W t + η , (8) ∂ W where the positive gradient is given by (easy to be computed): ∂φ + ∂ W = v T P ( h | v ) . (9) 8 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts The right term of Equation 9 can be computed as follows: n � P ( h | v ) = P ( h j = 1 | v ) , (10) j =1 where � m � � P ( h j = 1 | v ) = σ w ij v i + b j . (11) i =1 In this case, σ ( x ) = 1 / (1 + exp( − x )). However, the main problem concerns with the negative gradient , which is given by: ∂φ − ∂ W = v T P ( v | h ) , (12) 9 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts where v denotes the estimative (model) of the input data v , and P ( v | h ) is given by: m � P ( v | h ) = P ( v i = 1 | h ) , (13) i =1 and   n �  . P ( v i = 1 | h ) = σ w ij h j + a i (14)  j =1 The problem is to obtain a proper approximation of the model, i.e., ∂φ − ∂ W , which requires a large number of iterations to be computed. 10 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts Usually, we can model the task of estimating a conditional probability by means of the Markov Chain Monte Carlo (MCMC) approach, which models each step towards the approximation of the real data as a Markov chain . A Markov chain is basically a directed and weighted graph that obeys some properties (Ergodic Theorem): ✯ 0.1 0.7 D A 0.3 0.9 C 11 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts One of the most famous approach for sampling in Markov chains is the so-called Gibbs sampling , which approaches the likelihood solution when k → ∞ , being k the number of iterations. ✯ ✯ ✯ P(B|A) P(B|A) ... P(B|A) P(A|B) A A A 12 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts How can we use Gibbs sampling for RBMs? Let’s say we have a v 1 , ˜ v 2 , . . . , ˜ v k } compose by the input data Markov chain C = { v , ˜ v t . (initial state) v and its reconstruction at time step t given by ˜ ... ... h h 1 k k 0 0 1 0 k 1 1 ... P(h|v ) P(v |h ) P(v |h ) P(h |v ) v ... v ... v ... random data model approximation Problem? (High computational burden, since we need k → ∞ ) 13 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts Hinton (2002) a proposed the Contrasttive Divergence (CD), which alleviates the problem of Gibbs sampling. h ... h ... 1 k << ∞ 0 k k 0 1 0 1 1 ... P(h|v ) P(v |h ) P(v |h ) P(h |v ) v ... v ... v ... training data model approximation Usually, k = 1. Problem? (Estimated models tend to stay close to training samples) a Hinton, G. E. “Training products of experts by minimizing contrastive divergence”, Neural Computation , 14(8), 1771-1800, 2002. 14 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Restricted Boltzmann Machines Main concepts After that, we have two main variations of CD: Persistent Contrastive Divergence (PCD) a Fast Persistent Contrastive Divergence (FPCD) b a Tieleman T. “Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient”, Proceedings of the 25th Annual International Conference on Machine Learning , 1064-1071, 2008. b Tieleman T., Hinton G. E. “Using Fast Weights to Improve Persistent Contrastive Divergence”, Proceedings of the 26th Annual International Conference on Machine Learning , 1033-1040, 2009. 15 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Deep Belief Networks Main concepts Stacked RBMs on top of each other (greedy training). ... h L W L ... ... h 2 W 2 h 1 ... W 1 v ... 16 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Deep Boltzmann Machines Main concepts Inference depends on lower and upper layers (intermediate layers); It usually works better than DBNs.   m 1 n 2 � � P ( h 1 j = 1 | v , h 2 ) = φ w 1 w 2 jz h 2  . ij v i + (15)  z z =1 i =1 17 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Talk Outline Restricted Boltzmann Machines 1 Harmony Search 2 Quaternions 3 Methodology and Experiments 4 Conclusions and Future Works 5 18 / 35

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu Harmony Search Main concepts Harmony Search is a meta-heuristic algorithm inspired in the improvisation process of music players. Each possible solution is modelled as a harmony, and each musician corresponds to one decision variable. Let ϕ = ( ϕ 1 , ϕ 2 , . . . , ϕ N ) be a set of harmonies that compose the so-called “Harmony Memory”, such that ϕ i ∈ ℜ M . The HS algorithm generates after each iteration a new harmony vector ˆ ϕ based on memory considerations, pitch adjustments, and randomization (music improvisation). Further, the new harmony vector ˆ ϕ is evaluated in order to be accepted in the harmony memory: if ˆ ϕ is better than the worst harmony, the latter is then replaced by the new harmony. 19 / 35

On the Fine-Tuning Parameters in Deep Boltzmann Machines Using - PowerPoint PPT Presentation

Talk Outline Restricted Boltzmann Machines Harmony Search Quaternions Methodology and Experiments Conclusions and Futu On the Fine-Tuning Parameters in Deep Boltzmann Machines Using Quaternions Jo ao Paulo Papa papa@fc.unesp.br March

CSC321 Lecture 19: Boltzmann Machines Roger Grosse Roger Grosse CSC321 Lecture 19: Boltzmann

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Transport properties - Boltzmann equation goal: calculation of conductivity Boltzmann transport

fine-tuning April 9, 2019 1 Fine Tuning In [1]: % matplotlib inline import d2l from mxnet

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

Biologically-Inspired Sparse Restricted Boltzmann Machines Pablo Tostado Michael Wiest Alice

On the Thermodynamic Equivalence between Hopfield Networks and Hybrid Boltzmann Machines Enrica

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

Boltzmann Sampling and Random Generation of Combinatorial Structures Philippe Flajolet Based on

Einstein on Boltzmann principle Giovanni Jona-Lasinio Galileo Galilei Institute, May 27, 2014

Non Isotropic Cauchy Theory for the Boltzmann Nordheim Equations Equation for Bosons. Bose

Fourier Law and Non-Isothermal Boundary in the Boltzmann Theory Joint work with Raffaele

with Applications to Change-point Detection and Restricted Boltzmann Machine Restricted Boltzmann

NFHS Basketball Legal Uniforms Take Part. Get Set For Life. Basketball Uniforms Legal

S9306 Extreme Signal-Processing Performance Using Tensor Cores Astronomical Imaging on GPUs John

Lecture 4 - Cosmological parameter dependence of the temperature power spectrum (continued) -

EYFS Maths Presentation Thank you for coming! Aims To outline the key areas of the Maths

Replotting the Nyquist Plot: A New Visualization Proposal Predrag Pejovi Introduction

Faltings Heights of CM Elliptic Curves Tyler Genao Florida Atlantic University In collaboration

Data Class XI ( As per CBSE Board) Handling New Syllabus 2019-20 Visit : python.mykvs.in for

Corporate Presentation March 8, 2019 PLATINUM GROUP METALS | WATERBERG PGM PROJECT DISCLOSURE