New Modification of Restricted Boltzmann Machine that Considers the - - PowerPoint PPT Presentation
New Modification of Restricted Boltzmann Machine that Considers the - - PowerPoint PPT Presentation
New Modification of Restricted Boltzmann Machine that Considers the Stochasticity of Real Neural Network Guillermo Barrios Morales Ruoyu Zhao Restricted Boltzmann Machines (RBM) Probability of Each state of RBM has an energy: Probability
Restricted Boltzmann Machines (RBM)
Each state of RBM has an energy: Probability of every state Probability of every visible patterns The goal of the training process is to make the states we want the machine to learn, be those with the largest probability. It can be proved that RBM evolves to states that are local minima of the energy
1-step Contrastive Divergence (CD-1) Algorithm
Use Gibbs sampling to update each neuron: update the neurons with a probability Update the parameters after 1 step of Gibbs sampling Partial Derivative Equation
- f Probability of each visible
state
100 training samples of RMNIST/10
Reduced MNIST/10 (RMNIST/10)
- MNIST is a dataset of
handwritten digit images with 60000 training samples and 10000 testing samples.
- RMNIST/10 is a dataset that
takes 100 random training samples (10 samples for each digit) and the whole 10000 testing samples from MNIST.
- State-of-the-art learning machine can have
a classification accuracy of more than 99% if they are taught the whole training patterns.
TEST CORRECT RATE (%) Reference 99.65 Ciresan et al. IJCAI 2011 99.73 Ciresan et al. ICDAR 2011 99.77 Ciresan et al. CVPR 2012
Children do not learn 60000 samples of digits between 0 and 9, but they can still recognize many different versions of the same number with a high accuracy rate.
Why using RMNIST/10
On the biological basis of the GBM
➢ What we know … ○ Synaptic plasticity play a key role in memory and learning processes. Long-term potentiation (LTP) and long-term depression (LTD) are mechanisms which strengthen or weaken the synapses, increasing
- r decreasing the probability of releasing neurotransmitters.
○ Synaptic transmission can be modeled as an stochastic process (W.Maas, A.M. Zador, 1999) ○ A positive bias lowers the firing threshold for an action potential, whereas a negative bias raises it, therefore increasing or decreasing indirectly the probability of releasing neurotransmitters. ➢ What we infer … ○ LTP (LTD) can be realized in our model as an increase (decrease) in the bias of those neurons that have stronger (weaker) connections. ○ The value of the bias for a particular neuron i follows a Gaussian distribution with mean given by the average synapse strength with all the neurons that are connected to i.
n=
GBM: Gaussian Boltzmann Machine
n pn(n)
28x28 28x28
Training and testing process of the GBM
Trag
GBM Wij
Training Set (3000 digits)
28x28 px 10 digits
Testing Set (10000 digits)
10 digits 28x28 px
Did it recognize the number?
1x794 1x794
RBM (Weights) GBM (Weights) RBM: 76.7 %
Classification Task
GBM: 78.4 %
Reconstruction Task (Pattern Completion)
40 % Covered GBM: 26.0 % GBM: 61.6 % 60 % Covered RBM: 31.4 % RBM: 1.2 %
Reconstruction Task (Noisy input)
25 % Noise GBM: 26.0 % GBM: 51.5 % 50 % Noise RBM: 70.0 % RBM: 42.7 %
Influence of Machine Parameters on Classification Task with RMNIST/10
Exploring how stochasticity of GBM influences its performance
Influence of training samples randomness (considered) RBM GBM
Mean: 0.5974 Var: 4.8836e-04 Mean: 0.6461 Var:4.9067e-05 Mean: 0.6344 Var: 6.5270e-06 Mean: 0.5919 Var: 6.25e-06
Variance with each testing process (ignored)
Considerations about Repeatability
➢ The number of epochs was set to 500. ➢ The number of hidden units was set to (2/3) of the visible ones. ➢ We assess the variation in performance by changing … ○ The learning rate for both, RBM and GBM. ○ The variance of the gaussian in GBM after setting the learning rate at its
- ptimal value.
➢ For each value of the parameters, we repeat 5 times the training process with different sets of 100 random training vectors, and then test the resulting machine.
Setting of parameters
Influence of Learning Rate in RBM
Influence of Learning Rate in GBM
- Evidently, if we expand the
learning time, the optimal learning rate will be smaller.
- These results explain why
when we train both machines for a longer time with the same learning rate, GBM shows a better performance.
Influence of Gaussian Variance in GBM
- If we calculate the bias
deterministically from the weight matrix, the performance is low.
- By adding certain
stochasticity in the updating process of the biases, we reach a higher accuracy.
Conclusions
➢ RBM and GBM show similar performance in classification tasks. On the other hand, while RBM reconstruct better noisy inputs, GBM shows a greater accuracy in pattern-completion tasks. GBMs “guess” better than they analyze. ➢ For small training sets, GBM shows less accuracy in classifying tasks, but presents less variance as we increase the learning rate. GBMs can achieve stability with faster learning processes. ➢ We expect therefore GBMs to outperform RBMs in tasks involving very big training sets, that need higher learning rates to be computationally feasible, or when a faster learning is needed.
One p te …
➢ Compare the performance of both machines using complete MNIST dataset. ➢ Study the response of GBM to random pruning of connections.