Simulated Annealing input : ( x 1 , t 1 ) , . . . , ( x N , t N ) R - PowerPoint PPT Presentation

Simulated Annealing input : ( x 1 , t 1 ) , . . . , ( x N , t N ) ∈ R d × {− 1 , +1 } ; T start , T stop ∈ R output : w begin Randomly initialize w T ← T start repeat w ← N ( w ) //neighbors of w , e.g. by adding � Gaussion noise ( N (0 , σ )) if E ( � w ) < E ( w ) then w ← � w � � − E ( b w ) − E ( w ) else if exp > rand [0 , 1) then T w ← � w decrease ( T ) until T < T stop return w end – p. 156

Continuous Hopfield Network Let us consider our previously defined Hopfield network (identical architecture and learning rule), however with following activity rule   �  1  S i = tanh T · w ij S j j Start with a large (temperature) value of T and decrease it by some magnitude whenever a unit is updated (deterministic simulated annealing). This type of Hopfield network can approximate the probability distribution � 1 � 1 1 2 x T Wx P ( x | W ) = Z ( W ) exp[ − E ( x )] = Z ( W ) exp – p. 157

Continuous Hopfield Network � exp( − E ( x ′ )) (sum over all possible states) Z ( W ) = x ′ is the partition function and ensures P ( x | W ) is a probability distribution. Idea: construct a stochastic Hopfield network that implements the probability distribution P ( x | W ) . • Learn a model that is capable of generating patterns from that unknown distribution. • Quantify (classify) by means of probabilities seen and unseen patterns. • If needed, we can generate more patterns (generative model). – p. 158

Boltzmann Machines Given patterns { x ( n ) } N 1 , we want to learn the weights such that the generative model � 1 � 1 2 x T Wx P ( x | W ) = Z ( W ) exp is well matched to those patterns. The states are updated according to the stochastic rule: 1 • set x n = +1 with probability 1+exp ( − 2 P j w ij x j ) • else set x n = − 1 . Posterior probability of the weights given the data (Bayes’ theorem) �� N � n =1 P ( x ( n ) | W ) P ( W ) P ( W |{ x ( n ) } N 1 ) = P ( { x ( n ) } N 1 ) – p. 159

Boltzmann Machines Apply maximum likelihood method on the first term in numerator: � N � � 1 � N � � 2 x ( n ) T Wx ( n ) − ln Z ( W ) P ( x ( n ) | W ) ln = n =1 n =1 Taking derivative of the log likelihood gives: note that W is 2 x ( n ) T Wx ( n ) = x ( n ) x ( n ) ∂ 1 symmetric ( w ij = w ji ) that is i j ∂w ij and ∂ � 1 � � 2 x ( n ) T Wx ( n ) 1 ∂ ln Z ( W ) = exp Z ( w ) ∂w ij ∂w ij x � 1 � � 1 2 x ( n ) T Wx ( n ) = exp x i x j Z ( W ) x � x i x j P ( x | W ) = � x i x j � P ( x | W ) = x – p. 160

Boltzmann Machines (cont.) N � � � ∂ x ( n ) x ( n ) ln P ( { x ( n ) } N 1 | W ) − � x i x j � P ( x | W ) = i j ∂w ij n =1 � � = � x i x j � Data − � x i x j � P ( x | W ) N Empirical correlation between x i and x j � � N � � x i x j � Data ≡ 1 x ( n ) x ( n ) i j N n =1 Correlation between x i and x j under the current model � � x i x j � P ( x | W ) ≡ x i x j P ( x | W ) x – p. 161

Interpretation of Boltzmann Machines Learning Illustrative description (MacKay’s book, pp. 523): • Awake state: measure correlation between x i and x j in the real world, and increase the weights in proportion to the measured correlations. • Sleep state: dream about the world using the generative model P ( x | W ) and measure the correlation between x i and x j in the model world. Use these correlations to determine a proportional decrease in the weights. If correlations in dream world and real world are matching, then the two terms balanced and weights do not change. – p. 162

Boltzmann Machines with Hidden Units To model higher order correlations hidden units are required. • x : states of visible units, • h : states of hidden units, • generic state of a unit (either visible or hidden) by y i , with y ≡ ( x , h ) , • state of network when visible units are clamped in state x ( n ) is y ( n ) ≡ ( x ( n ) , h ) . Probability of W given a single pattern x ( n ) is � 1 � � � 1 2 y ( n ) T Wy ( n ) P ( x ( n ) | W ) = P ( x ( n ) , h | W ) = Z ( W ) exp h h � 1 � � where 2 y T Wy Z ( W ) = exp x , h – p. 163

Boltzmann Machines with Hidden Units (cont.) Applying the maximum likelihood method as before one obtains   �   ∂ ln P ( { x ( n ) } N   1 | W ) =  � y i y j � P ( h | x ( n ) , W ) − � y i y j � P ( h | x , W )  ∂w ij � �� n free clamped to x ( n ) Term � y i y j � P ( h | x ( n ) , W ) is the correlation between y i and y j when Boltzmann machine is simulated with visible variables clamped to x ( n ) and hidden variables freely sampling from their conditional distribution. Term � y i y j � P ( h | x , W ) is the correlation between y i and y j when the Boltzmann machine generates samples from its model distribution. – p. 164

Boltzmann Machines with Input-Hidden-Output The so far considered Boltzmann machine is a powerful stochastic Hopfield network with no ability to perform classification. Let us introduce visible input and output units: • x ≡ ( x i , x o ) Note that pattern x ( n ) consists of an input and output part, � � that is, x ( n ) ≡ x ( n ) , x ( n ) . o i   �      � y i y j � P ( h | x ( n ) , W ) − � y i y j � P ( h | x , W )  � �� n clamped to ( x ( n ) , x ( n ) clamped to x ( n ) ) o i i – p. 165

Boltzmann Machines Updates Weights Combine gradient descent and simulated annealing to update weights     ∆ w ij = η    � y i y j � P ( h | x ( n ) , W ) − � y i y j � P ( h | x , W )  T � �� clamped to x ( n ) clamped to ( x ( n ) , x ( n ) ) o i i High computational complexity: • present each pattern several times • anneal several times Mean-field version of Boltzmann learning: • calculate approximations of the correlations ( [ y i y j ] ) entering the gradient – p. 166

Deterministic Boltzmann Learning input : { x ( n ) } N 1 ; η, T start , T stop ∈ R output : W begin T ← T start repeat randomly select pattern from sample { x ( n ) } N 1 randomize states anneal network with input and output clamped at final, low T , calculate [ y i y j ] x i , x o clamped randomize states anneal network with input clamped but output free at final, low T, calculate [ y i y j ] x i clamped � � w ij ← w ij + η/T [ y i y j ] x i , x o clamped ] − [ y i y j ] x i clamped until T < T stop return w end – p. 167

Simulated Annealing input : ( x 1 , t 1 ) , . . . , ( x N , t N ) R - PowerPoint PPT Presentation

Simulated Annealing input : ( x 1 , t 1 ) , . . . , ( x N , t N ) R d { 1 , +1 } ; T start , T stop R output : w begin Randomly initialize w T T start repeat w N ( w ) //neighbors of w , e.g. by adding Gaussion noise ( N

Simulated Annealing Simulated annealing is a probabilistic search algorithm. The

Simulated Annealing G5BAIM: Artificial Intelligence Methods Graham Kendall 15 Feb 09 1

Outline Convergence DM812 METAHEURISTICS Lecture 2 1. Simulated Annealing Simulated Annealing

Simulated quantum annealing of double- Simulated quantum annealing of double- well and multiwell

CHAPTER V V CHAPTER Annealing by Stochastic Annealing by Stochastic Neural Networks for

What Is the Optimal Which Annealing . . . Annealing Schedule in Physical Meaning of . . . Need

Simulated Annealing Key idea: Vary temperature parameter, i.e. , probability of accepting

Simulated Annealing Chad Germany

Simulated Annealing Key idea: Vary temperature parameter, i.e. , probability of accepting

Simulated Annealing November 27th, 2012 Biostatistics 615/815 - Lecture 20 Hyun Min Kang

On Simulated Annealing in EDA A tribute to Prof. C. L. Liu at ISPD 2012 Martin D.F. Wong

Lin-Kernighan Heuristic. Simulated Annealing Marco Chiarandini Outline 1. Competition 2.

Informed search algorithms & Hill-climbing & Simulated annealing Chapter 4 Chapter 4 1

A Practical Approach to Quantum Annealing GOTO CHICAGO 2020 AGENDA Practical Quantum Annealing

Advanced Search Simulated annealing Yingyu Liang yliang@cs.wisc.edu Computer Sciences

CS137: Electronic Design Automation Day 10: February 6, 2002 Placement (Simulated

Simulated Annealing with Penalization for University Course Timetabling Edon Gashi & Kadri

Local Search for a Globally Optimal Solution Russell and Norvig Chapter 4 Limitations of hill

Combining multiresolution and anisotropy Theory, algorithms and open problems Albert Cohen

Anisotropy of thermal photons and dileptons V.V. Goloviznin, A.M. Snigirev, G.M. Zinovjev SINP,

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics & Statistics

A Statistical Analysis of Snow Depth Trends in North America Jonathan Woody Mississppi State

Geomaterial Characterization Sub-topics Need for Geomaterial characterization Geotechnical

CE-641 Lecture # 13-14 Geomaterial Characterization Sub-topics Need for Geomaterial

Simulated Annealing input : ( x 1 , t 1 ) , . . . , ( x N , t N ) R - PowerPoint PPT Presentation

Simulated Annealing input : ( x 1 , t 1 ) , . . . , ( x N , t N ) R d { 1 , +1 } ; T start , T stop R output : w begin Randomly initialize w T T start repeat w N ( w ) //neighbors of w , e.g. by adding Gaussion noise ( N

Simulated Annealing Simulated annealing is a probabilistic search algorithm. The

Simulated Annealing G5BAIM: Artificial Intelligence Methods Graham Kendall 15 Feb 09 1

Outline Convergence DM812 METAHEURISTICS Lecture 2 1. Simulated Annealing Simulated Annealing

Simulated quantum annealing of double- Simulated quantum annealing of double- well and multiwell

CHAPTER V V CHAPTER Annealing by Stochastic Annealing by Stochastic Neural Networks for

What Is the Optimal Which Annealing . . . Annealing Schedule in Physical Meaning of . . . Need

Simulated Annealing Key idea: Vary temperature parameter, i.e. , probability of accepting

Simulated Annealing Chad Germany

Simulated Annealing Key idea: Vary temperature parameter, i.e. , probability of accepting

Simulated Annealing November 27th, 2012 Biostatistics 615/815 - Lecture 20 Hyun Min Kang

On Simulated Annealing in EDA A tribute to Prof. C. L. Liu at ISPD 2012 Martin D.F. Wong

Lin-Kernighan Heuristic. Simulated Annealing Marco Chiarandini Outline 1. Competition 2.

Informed search algorithms &amp; Hill-climbing &amp; Simulated annealing Chapter 4 Chapter 4 1

A Practical Approach to Quantum Annealing GOTO CHICAGO 2020 AGENDA Practical Quantum Annealing

Advanced Search Simulated annealing Yingyu Liang yliang@cs.wisc.edu Computer Sciences

CS137: Electronic Design Automation Day 10: February 6, 2002 Placement (Simulated

Simulated Annealing with Penalization for University Course Timetabling Edon Gashi &amp; Kadri

Local Search for a Globally Optimal Solution Russell and Norvig Chapter 4 Limitations of hill

Combining multiresolution and anisotropy Theory, algorithms and open problems Albert Cohen

Anisotropy of thermal photons and dileptons V.V. Goloviznin, A.M. Snigirev, G.M. Zinovjev SINP,

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics &amp; Statistics

A Statistical Analysis of Snow Depth Trends in North America Jonathan Woody Mississppi State

Geomaterial Characterization Sub-topics Need for Geomaterial characterization Geotechnical

CE-641 Lecture # 13-14 Geomaterial Characterization Sub-topics Need for Geomaterial

Informed search algorithms & Hill-climbing & Simulated annealing Chapter 4 Chapter 4 1

Simulated Annealing with Penalization for University Course Timetabling Edon Gashi & Kadri

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics & Statistics