Neural Networks
Hopfield Nets and Boltzmann Machines Spring 2020
1
Neural Networks Hopfield Nets and Boltzmann Machines Spring 2020 1 - - PowerPoint PPT Presentation
Neural Networks Hopfield Nets and Boltzmann Machines Spring 2020 1 Recap: Hopfield network Symmetric loopy network Each neuron is a perceptron with +1/-1 output 2 Recap: Hopfield network At each time each neuron receives a
1
2
3
4
Not assuming node bias
state PE 5
state PE
6
– In doing so it may flip
– Which may flip
7
– Dipoles stop flipping if flips result in increase of energy
– Where energy is a local minimum
configuration
– I.e. the system remembers its stable state and returns to it
9
11
12
1
1 13
1
1 1 1
1
14
1
1 1 1
1
15
Number of patterns
1
1 1 1
1
16
17
18
19
20
Energy landscape
an additive constant Gradients and location
21
Energy landscape
an additive constant Gradients and location
Both have the same Eigen vectors
22
Energy landscape
an additive constant Gradients and location
NOTE: This is a positive semidefinite matrix Both have the same Eigen vectors
23
24
25
1 -1 1
26
– Shown from above (assuming 0 bias)
– But components of can only take values – I.e. lies on the corners of the unit hypercube
27
Stored patterns
28
– If a pattern is stored, it’s “ghost” is stored as well – Intuitively, patterns must ideally be maximally far apart
Stored patterns Ghosts (negations)
29
is a projection
– Projects onto the nearest corner of the hypercube – It “quantizes” the space into orthants
– Each step rotates the vector and then projects it onto the nearest corner
30
1 1
2D example 3D example
31
32
33
(1,1) (1,-1)
34
35
36
37
38
39
– Let
𝑳 𝑳 are orthogonal to
– Get projected onto subspace spanned by
41
– Different patterns presented different numbers of times – Equivalent to having unequal Eigen values..
– Hint: For real valued vectors, use Lanczos iterations
42
– McElice and Posner, 84’ – E.g. when we had the Hebbian net with N orthogonal base patterns, all patterns are stationary
– Mostafa and St. Jacques 85’
– McElice et. Al. 87’
– But this may come with many “parasitic” memories
43
– McElice and Posner, 84’ – E.g. when we had the Hebbian net with N orthogonal base patterns, all patterns are stable
– Mostafa and St. Jacques 85’
– McElice et. Al. 87’
– But this may come with many “parasitic” memories
44
How do we find this network?
– McElice and Posner, 84’ – E.g. when we had the Hebbian net with N orthogonal base patterns, all patterns are stable
– Mostafa and St. Jacques 85’
– McElice et. Al. 87’
– But this may come with many “parasitic” memories
45
Can we do something about this? How do we find this network?
through Hebbian learning with 0.996 probability of recall
– The recalled patterns are the Eigen vectors of the weights matrix with the highest Eigen values
– For orthogonal patterns, the patterns are the Eigen vectors of the constructed weights matrix – All Eigen values are identical
exponential in N
as large as N
– But comes with many parasitic memories
46
47
48
49
50
another fixed-value component
51
52
53
54
55
Energy Bowls will all actually be quadratic
– Make them local minima – Emphasize more “important” memories by repeating them more frequently
56
Energy Target patterns
57
Energy
– If you raise every valley, eventually they’ll all move up above the target patterns, and many will even vanish
58
Energy
59
Energy
60
state Energy
61
64
state Energy
65
state Energy
66
state Energy
– It will settle in a valley. If this is not the target pattern, raise it
67
69
state Energy
70
state Energy
71
state Energy
72
state Energy
– Issue: Hebbian learning assumes all patterns to be stored are equally important
– But comes with many parasitic memories
– By minimizing the energy of the target patterns, while increasing the energy of the neighboring patterns
74
75
76
77
– The neurons that store the actual patterns of interest: Visible neurons – The neurons that only serve to increase the capacity but whose actual values are not important: Hidden neurons – These can be set to anything in order to store a visible pattern
width N of the patterns..
– The new width of the patterns is N+K – Now we can store N+K patterns!
79
Visible bits
width N of the patterns..
– The new width of the patterns is N+K – Now we can store N+K patterns!
80
Visible bits Hidden bits
– Simple option: Randomly
– We could even compose multiple extended patterns for a base pattern to increase the probability that it will be recalled properly
– Standard optimization method should work
81
Visible bits Hidden bits
patterns:
– Making errors in the don’t care bits doesn’t matter
82
Visible bits Hidden bits
83
84
85
– Derivation of this probability is in fact quite trivial..
86
87
88
state Energy
89
presented memories More importance to more attractive spurious memories
90
presented memories More importance to more attractive spurious memories
91
92
state PE
distribution at T=1
– This is the probability of different states that the network will wander over at equilibrium
states
– Where a state is a binary string – Specifically, it models a Boltzmann distribution – The parameters of the model are the weights of the network
– It is a generative model: generates states according to
be otherwise identical states that only differ in the i-th bit
– S has i-th bit = and S’ has i-th bit =
102
can take value 0 or 1 with a probability that depends on the local field
– Note the slight change from Hopfield nets – Not actually necessary; only a matter of convenience
probability given above
– Gibbs sampling: Fix N-1 variables and sample the remaining variable – As opposed to energy-based update (mean field approximation): run the test zi > 0 ?
106