Stochastic Simulation Simulated annealing
Bo Friis Nielsen
Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk
Stochastic Simulation Simulated annealing Bo Friis Nielsen - - PowerPoint PPT Presentation
Stochastic Simulation Simulated annealing Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk A general optimisation problem A general optimisation problem DTU
Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk
02443 – lecture 9 2
DTU
02443 – lecture 9 2
DTU
02443 – lecture 9 2
DTU
02443 – lecture 9 2
DTU
02443 – lecture 9 2
DTU
02443 – lecture 9 2
DTU
02443 – lecture 9 2
DTU
02443 – lecture 9 2
DTU
02443 – lecture 9 2
DTU
02443 – lecture 9 2
DTU
minimising points
02443 – lecture 9 2
DTU
minimising points
02443 – lecture 9 2
DTU
minimising points
02443 – lecture 9 2
DTU
minimising points
02443 – lecture 9 2
DTU
minimising points
02443 – lecture 9 2
DTU
minimising points
cardinality of M (number of elements in M) is finite
02443 – lecture 9 2
DTU
minimising points
cardinality of M (number of elements in M) is finite
02443 – lecture 9 2
DTU
minimising points
cardinality of M (number of elements in M) is finite
also |S|
02443 – lecture 9 2
DTU
minimising points
cardinality of M (number of elements in M) is finite
also |S| < ∞.
We introduce a probability distribution over S to be PT(x)
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
= e(f⋆−f(x))/T |M| +
y∈S\M e(f⋆−f(y))/T
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
= e(f⋆−f(x))/T |M| +
y∈S\M e(f⋆−f(y))/T
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
= e(f⋆−f(x))/T |M| +
y∈S\M e(f⋆−f(y))/T
expression
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
= e(f⋆−f(x))/T |M| +
y∈S\M e(f⋆−f(y))/T
expression multiplied with a difficult to calculate constant
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
= e(f⋆−f(x))/T |M| +
y∈S\M e(f⋆−f(y))/T
expression multiplied with a difficult to calculate constant
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
= e(f⋆−f(x))/T |M| +
y∈S\M e(f⋆−f(y))/T
expression multiplied with a difficult to calculate constant
valuels of f(x)) will be more frequent/likely
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
= e(f⋆−f(x))/T |M| +
y∈S\M e(f⋆−f(y))/T
expression multiplied with a difficult to calculate constant
valuels of f(x)) will be more frequent/likely
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
= e(f⋆−f(x))/T |M| +
y∈S\M e(f⋆−f(y))/T
expression multiplied with a difficult to calculate constant
valuels of f(x)) will be more frequent/likely
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
= e(f⋆−f(x))/T |M| +
y∈S\M e(f⋆−f(y))/T
expression multiplied with a difficult to calculate constant
valuels of f(x)) will be more frequent/likely
We introduce a probability distribution over S to be PT(x) = e−f(x)/T
e−f(x)/T |M|e−f⋆/T +
y∈S\M e−f(y)/T
= e(f⋆−f(x))/T |M| +
y∈S\M e(f⋆−f(y))/T
expression multiplied with a difficult to calculate constant
valuels of f(x)) will be more frequent/likely
minimum energy
02443 – lecture 9 4
DTU
02443 – lecture 9 4
DTU
02443 – lecture 9 4
DTU
02443 – lecture 9 4
DTU
local optima min x f(x)
02443 – lecture 9 4
DTU
local optima min x f(x)
02443 – lecture 9 4
DTU
local optima min x f(x)
02443 – lecture 9 4
DTU
local optima min x f(x)
02443 – lecture 9 4
DTU
local optima min x f(x)
02443 – lecture 9 4
DTU
local optima min x f(x)
Metropolis-Hastings
02443 – lecture 9 4
DTU
local optima min x f(x)
Metropolis-Hastings - Kirkpatrick paper Science 1983
02443 – lecture 9 4
DTU
local optima min x f(x)
Metropolis-Hastings - Kirkpatrick paper Science 1983
02443 – lecture 9 4
DTU
local optima min x f(x)
Metropolis-Hastings - Kirkpatrick paper Science 1983
02443 – lecture 9 5
DTU
02443 – lecture 9 5
DTU
Steel and other materials can exist in several crystalline structures.
02443 – lecture 9 5
DTU
Steel and other materials can exist in several crystalline structures. One - the ground state
02443 – lecture 9 5
DTU
Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy.
02443 – lecture 9 5
DTU
Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states
02443 – lecture 9 5
DTU
Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable.
02443 – lecture 9 5
DTU
Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable. This is likely to happen when welding, machining, etc.
02443 – lecture 9 5
DTU
Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable. This is likely to happen when welding, machining, etc. By heating the material
02443 – lecture 9 5
DTU
Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable. This is likely to happen when welding, machining, etc. By heating the material and slowly cooling,
02443 – lecture 9 5
DTU
Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable. This is likely to happen when welding, machining, etc. By heating the material and slowly cooling, we ensure that the material ends in the ground state.
02443 – lecture 9 5
DTU
Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable. This is likely to happen when welding, machining, etc. By heating the material and slowly cooling, we ensure that the material ends in the ground state. This process is called annealing.
02443 – lecture 9 6
DTU
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms).
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S.
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics,
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.
f(x, T) =
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.
f(x, T) = cT · exp
T
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.
f(x, T) = cT · exp
T
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.
f(x, T) = cT · exp
T
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.
f(x, T) = cT · exp
T
Note the normalization constant cT is unknown;
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.
f(x, T) = cT · exp
T
Note the normalization constant cT is unknown; can be found by integration,
02443 – lecture 9 6
DTU
Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.
f(x, T) = cT · exp
T
Note the normalization constant cT is unknown; can be found by integration, but our algorithms will not require it.
02443 – lecture 9 7
DTU
0.0 0.2 0.4 0.6 0.8 1.0 −1.0 −0.5 0.0 0.5 1.0 State x Potential U(x)
02443 – lecture 9 8
DTU
0.0 0.2 0.4 0.6 0.8 1.0 2 4 6 8 10 State x p.d.f.
Let the temperature be a decreasing function of time
Let the temperature be a decreasing function of time or iteration number - k.
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC,
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f.
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti).
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi.
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi. If the candidate has lower energy than the old state, accept.
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi. If the candidate has lower energy than the old state, accept. Otherwise, accept only with probability
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi. If the candidate has lower energy than the old state, accept. Otherwise, accept only with probability exp(−(U(Yi) − U(Xi))/Ti)
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi. If the candidate has lower energy than the old state, accept. Otherwise, accept only with probability exp(−(U(Yi) − U(Xi))/Ti) for a symmetric proposal distribution
Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi. If the candidate has lower energy than the old state, accept. Otherwise, accept only with probability exp(−(U(Yi) − U(Xi))/Ti) for a symmetric proposal distribution (to keep the probabilistic interpreation)
02443 – lecture 9 10
DTU
−1.0 0.0 1.0 0.0 0.2 0.4 0.6 0.8 1.0 U(x) x 2000 4000 6000 8000 10000 Index
02443 – lecture 9 11
DTU
02443 – lecture 9 11
DTU
02443 – lecture 9 11
DTU
02443 – lecture 9 11
DTU
02443 – lecture 9 11
DTU
02443 – lecture 9 12
DTU
02443 – lecture 9 12
DTU
A basic problem in combinatorial optimisation
02443 – lecture 9 12
DTU
A basic problem in combinatorial optimisation Given n stations,
02443 – lecture 9 12
DTU
A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j.
02443 – lecture 9 12
DTU
A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j. Find a route S (a permutation of 1, . . . , n) which
02443 – lecture 9 12
DTU
A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j. Find a route S (a permutation of 1, . . . , n) which
02443 – lecture 9 12
DTU
A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j. Find a route S (a permutation of 1, . . . , n) which
02443 – lecture 9 12
DTU
A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j. Find a route S (a permutation of 1, . . . , n) which
n−1
02443 – lecture 9 12
DTU
A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j. Find a route S (a permutation of 1, . . . , n) which
n−1
A(Si, Si+1)
02443 – lecture 9 13
DTU
Town Town to from 1 2 3 4 5 6 1
3 1 4 12 2 2
11 13 30 3 6 8
12 5 4 33 9 5
17 5 1 15 6 10
6 24 6 8 9 40
02443 – lecture 9 13
DTU
Town Town to from 1 2 3 4 5 6 1
3 1 4 12 2 2
11 13 30 3 6 8
12 5 4 33 9 5
17 5 1 15 6 10
6 24 6 8 9 40
02443 – lecture 9 13
DTU
Town Town to from 1 2 3 4 5 6 1
3 1 4 12 2 2
11 13 30 3 6 8
12 5 4 33 9 5
17 5 1 15 6 10
6 24 6 8 9 40
5+22+13+60+14+24 = 138
02443 – lecture 9 14
DTU
5
24 14 60 22 COST = 138
02443 – lecture 9 15
DTU
proposal, permute two random stations on the route. As cooling scheme, you can use e.g. Tk = 1/ √ 1 + k. or Tk = − log(k + 1), feel free to experiment with different
random permutation of stations. (a) Have input be positions in the plane of the n stations. Let the cost of going i → j be the Euclidian distance between station i and j. Plot the resulting route in the plane. Debug with stations on a circle. (b) Then modify your progamme to work with costs directly and apply it to the cost matrix from the course homepage.