Neural Networks
Hopfield Nets and Boltzmann Machines Fall 2017
1
Neural Networks Hopfield Nets and Boltzmann Machines Fall 2017 1 - - PowerPoint PPT Presentation
Neural Networks Hopfield Nets and Boltzmann Machines Fall 2017 1 Recap: Hopfield network = +1 > 0 = + 1 0 Symmetric loopy
1
πβ π
πππ§π + ππ
2
πππ§π + ππ
πβ π
πππ§π + ππ
3
π,π<π
πβ π
πππ§π
4
Not assuming node bias
state PE 5
state PE
6
β In doing so it may flip
β Which may flip
7
πΉ(π‘) = π· β 1 2 ΰ·
π
π¦ππ ππ = π· β ΰ·
π
ΰ·
π>π
π π¦ππ¦π ππ β ππ
2 β ΰ· π
πππ¦π
β Dipoles stop flipping if any flips result in increase of PE
π ππ = ΰ·
πβ π
π π¦π ππ β ππ
2 + ππ
π¦π = ΰ΅π¦π ππ π‘πππ π¦π π ππ = 1 βπ¦π ππ’βππ π₯ππ‘π
8
β Where PE is a local minimum
configuration
β I.e. the system remembers its stable state and returns to it
9
πΉ = β ΰ·
π
ΰ·
π>π
π₯
πππ§ππ§π
does not change significantly any more
π§π 0 = π¦π, 0 β€ π β€ π β 1
π§π π’ + 1 = Ξ ΰ·
πβ π
π₯
πππ§π
, 0 β€ π β€ π β 1
10
11
12
1
1 13
from everywhere
14
1
1 1 1
1
15
π
π β π = πππ β πππ
ππ = ΰ· πβ{π}
ππ§π π
16
βπ
πβ² = ΰ· π
π§π
πβ² π§π πβ²π§π πβ² + ΰ· πβ πβ²
ΰ·
π
π§π
ππ§π π π§π πβ² = (π β 1)π§π πβ² + ΰ· πβ πβ²
ΰ·
π
π§π
ππ§π ππ§π πβ²
flip
1
1 17
ππ = ΰ· πβ{π}
ππ§π π
βπ
πβ² = ΰ· π
π§π
πβ² π§π πβ²π§π πβ² + ΰ· πβ πβ²
ΰ·
π
π§π
ππ§π π π§π πβ² = (π β 1)π§π πβ² + ΰ· πβ πβ²
ΰ·
π
π§π
ππ§π ππ§π πβ²
πβ² Οπβ πβ² Οπ π§π ππ§π ππ§π πβ² is positive, then Οπβ πβ² Οπ π§π ππ§π π π§π πβ² is the same
sign as π§π
πβ², and it will not flip
π§π
πβ² Οπβ πβ² Οπ π§π ππ§π ππ§π πβ² will be positive for all symbols for all π of them?
1
1 18
ππ = ΰ· πβ{π}
ππ§π π
19
β Where? β Also note βshadowβ pattern
20
β Because any pattern π = βπ for our purpose
β Because π§1
ππ§2 = 0
β Others may be almost orthogonal
21
22
β No other local minima exist β Actual wells for patterns
β Note K > 0.14 N
23
24
β Note K > 0.14 N β Note some βghostsβ ended up in the βwellβ of other patterns
25
26
27
28
29
30
31
32
β But every stored pattern has βbowlβ β Fewer spurious minima than for the orthogonal 2-pattern case
33
β Most fake-looking memories are in fact ghosts..
34
35
36
37
38
39
40
41
42
β i.e. obtain a weight matrix W such that K > 0.14N patterns are stationary β Possible to make more than 0.14N patterns at-least 1-bit stable
β I.e. patterns that are closer are easier to remember than patterns that are farther!!
43
44
45
46
Energy landscape
an additive constant Gradients and location
47
Energy landscape
an additive constant Gradients and location
Both have the same Eigen vectors
48
Energy landscape
an additive constant Gradients and location
NOTE: This is a positive semidefinite matrix Both have the same Eigen vectors
49
50
51
52
53
Stored patterns
54
Stored patterns Ghosts (negations)
55
β π‘πππ ππ³π = π³π for all target patterns
β Projects π³ onto the nearest corner of the hypercube β It βquantizesβ the space into orthants
56
57
59
(1,1) (1,-1)
60
61
62
63
64
65
β Let π = π³1 π³2 β¦ π³πΏ π¬π³+1 π¬π³+2 β¦ π¬π π = πΞππ β π¬π³+1 π¬π³+2 β¦ π¬π are orthogonal to π³1 π³2 β¦ π³πΏ β π1 = π2 = ππΏ = 1 β ππΏ+1 , β¦ , ππ = 0
stable (same logic as earlier)
unstable
β Get projected onto subspace spanned by π³1 π³2 β¦ π³πΏ
66
67
ππ = ΰ· πβ{π}
ππ§π π
β Different patterns presented different numbers of times β Equivalent to having unequal Eigen values..
β Hint: Lanczos iterations
π
68
β McElice and Posner, 84β β E.g. when we had the Hebbian net with N orthogonal base patterns, all patterns are stable
β Mostafa and St. Jacques 85β
β McElice et. Al. 87β
β But this may come with many βparasiticβ memories
69
β McElice and Posner, 84β β E.g. when we had the Hebbian net with N orthogonal base patterns, all patterns are stable
β Mostafa and St. Jacques 85β
β McElice et. Al. 87β
β But this may come with many βparasiticβ memories
70
How do we find this network?
β McElice and Posner, 84β β E.g. when we had the Hebbian net with N orthogonal base patterns, all patterns are stable
β Mostafa and St. Jacques 85β
β McElice et. Al. 87β
β But this may come with many βparasiticβ memories
71
Can we do something about this? How do we find this network?
72
73
74
75
π
π³βππ
The bias can be captured by another fixed-value component
76
π
π³βππ
π³βππ
77
π
π³βππ
π³βππ
π³βππ
π³βππ
78
π³βππ
π³βππ
79
π³βππ
π³βππ
80
π³βππ
π³βππ
state Energy Bowls will all actually be quadratic
β Make them local minima β Emphasize more βimportantβ memories by repeating them more frequently
81
π³βππ
π³βππ
state Energy Target patterns
82
π³βππ
π³βππ
state Energy
β If you raise every valley, eventually theyβll all move up above the target patterns, and many will even vanish
83
π³βππ
π³βππ&π³=π€πππππ§
state Energy
84
π³βππ
π³βππ&π³=π€πππππ§
state Energy
85
state Energy
86
π³βππ
π³βππ&π³=π€πππππ§
π β π³π€π³π€ π
87
π³βππ
π³βππ&π³=π€πππππ§
π β π³π€π³π€ π
88
π³βππ
π³βππ&π³=π€πππππ§
89
state Energy
90
state Energy
91
state Energy
β It will settle in a valley. If this is not the target pattern, raise it
92
π³βππ
π³βππ&π³=π€πππππ§
π β π³π€π³π€ π
93
π³βππ
π³βππ&π³=π€πππππ§
94
state Energy
95
state Energy
96
state Energy
97
state Energy
π β π³ππ³π π
98
π³βππ
π³βππ&π³=π€πππππ§
β Minimizing energy maximizes log likelihood
99
β Derivation of this probability is in fact quite trivial..
100
101
102
state Energy
103
104
105
106
107