Stochastic Simulation Simulated annealing Bo Friis Nielsen - - PowerPoint PPT Presentation

stochastic simulation simulated annealing
SMART_READER_LITE
LIVE PREVIEW

Stochastic Simulation Simulated annealing Bo Friis Nielsen - - PowerPoint PPT Presentation

Stochastic Simulation Simulated annealing Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk A general optimisation problem A general optimisation problem DTU


slide-1
SLIDE 1

Stochastic Simulation Simulated annealing

Bo Friis Nielsen

Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk

slide-2
SLIDE 2

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

slide-3
SLIDE 3

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆
slide-4
SLIDE 4

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx
slide-5
SLIDE 5

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
slide-6
SLIDE 6

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
slide-7
SLIDE 7

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆
slide-8
SLIDE 8

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx
slide-9
SLIDE 9

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
slide-10
SLIDE 10

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
  • Note x⋆ might not be unique so we can define the set M
slide-11
SLIDE 11

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
  • Note x⋆ might not be unique so we can define the set M of

minimising points

slide-12
SLIDE 12

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
  • Note x⋆ might not be unique so we can define the set M of

minimising points

  • M =
slide-13
SLIDE 13

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
  • Note x⋆ might not be unique so we can define the set M of

minimising points

  • M = {x ∈ S
slide-14
SLIDE 14

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
  • Note x⋆ might not be unique so we can define the set M of

minimising points

  • M = {x ∈ S|f(x) = f ⋆}
slide-15
SLIDE 15

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
  • Note x⋆ might not be unique so we can define the set M of

minimising points

  • M = {x ∈ S|f(x) = f ⋆} Assume |M|
slide-16
SLIDE 16

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
  • Note x⋆ might not be unique so we can define the set M of

minimising points

  • M = {x ∈ S|f(x) = f ⋆} Assume |M| < ∞, that is the

cardinality of M (number of elements in M) is finite

slide-17
SLIDE 17

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
  • Note x⋆ might not be unique so we can define the set M of

minimising points

  • M = {x ∈ S|f(x) = f ⋆} Assume |M| < ∞, that is the

cardinality of M (number of elements in M) is finite

  • This will typically be the case for discrete optimisation,
slide-18
SLIDE 18

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
  • Note x⋆ might not be unique so we can define the set M of

minimising points

  • M = {x ∈ S|f(x) = f ⋆} Assume |M| < ∞, that is the

cardinality of M (number of elements in M) is finite

  • This will typically be the case for discrete optimisation, where

also |S|

slide-19
SLIDE 19

02443 – lecture 9 2

DTU

A general optimisation problem A general optimisation problem

  • f ⋆ = minx∈S f(x)
  • The set S can be quite general
  • x⋆ = argminx∈Sf(x)
  • Note x⋆ might not be unique so we can define the set M of

minimising points

  • M = {x ∈ S|f(x) = f ⋆} Assume |M| < ∞, that is the

cardinality of M (number of elements in M) is finite

  • This will typically be the case for discrete optimisation, where

also |S| < ∞.

slide-20
SLIDE 20

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x)

slide-21
SLIDE 21

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

slide-22
SLIDE 22

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y
slide-23
SLIDE 23

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈
slide-24
SLIDE 24

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S
slide-25
SLIDE 25

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T
slide-26
SLIDE 26

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T

slide-27
SLIDE 27

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|

slide-28
SLIDE 28

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T

slide-29
SLIDE 29

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S

slide-30
SLIDE 30

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

slide-31
SLIDE 31

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

= e(f⋆−f(x))/T |M| +

y∈S\M e(f⋆−f(y))/T

slide-32
SLIDE 32

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

= e(f⋆−f(x))/T |M| +

y∈S\M e(f⋆−f(y))/T

  • we have a probability function with an
slide-33
SLIDE 33

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

= e(f⋆−f(x))/T |M| +

y∈S\M e(f⋆−f(y))/T

  • we have a probability function with an “easy” to calculate

expression

slide-34
SLIDE 34

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

= e(f⋆−f(x))/T |M| +

y∈S\M e(f⋆−f(y))/T

  • we have a probability function with an “easy” to calculate

expression multiplied with a difficult to calculate constant

slide-35
SLIDE 35

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

= e(f⋆−f(x))/T |M| +

y∈S\M e(f⋆−f(y))/T

  • we have a probability function with an “easy” to calculate

expression multiplied with a difficult to calculate constant

  • For fixed T we can sample, states x
slide-36
SLIDE 36

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

= e(f⋆−f(x))/T |M| +

y∈S\M e(f⋆−f(y))/T

  • we have a probability function with an “easy” to calculate

expression multiplied with a difficult to calculate constant

  • For fixed T we can sample, states x with low “energy” (low

valuels of f(x)) will be more frequent/likely

slide-37
SLIDE 37

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

= e(f⋆−f(x))/T |M| +

y∈S\M e(f⋆−f(y))/T

  • we have a probability function with an “easy” to calculate

expression multiplied with a difficult to calculate constant

  • For fixed T we can sample, states x with low “energy” (low

valuels of f(x)) will be more frequent/likely

  • As T
slide-38
SLIDE 38

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

= e(f⋆−f(x))/T |M| +

y∈S\M e(f⋆−f(y))/T

  • we have a probability function with an “easy” to calculate

expression multiplied with a difficult to calculate constant

  • For fixed T we can sample, states x with low “energy” (low

valuels of f(x)) will be more frequent/likely

  • As T → 0
slide-39
SLIDE 39

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

= e(f⋆−f(x))/T |M| +

y∈S\M e(f⋆−f(y))/T

  • we have a probability function with an “easy” to calculate

expression multiplied with a difficult to calculate constant

  • For fixed T we can sample, states x with low “energy” (low

valuels of f(x)) will be more frequent/likely

  • As T → 0 the distribution will degenerate
slide-40
SLIDE 40

Optimisation problem - probability distribution Optimisation problem - probability distribution

We introduce a probability distribution over S to be PT(x) = e−f(x)/T

  • y∈S e−f(y)/T =

e−f(x)/T |M|e−f⋆/T +

y∈S\M e−f(y)/T

= e(f⋆−f(x))/T |M| +

y∈S\M e(f⋆−f(y))/T

  • we have a probability function with an “easy” to calculate

expression multiplied with a difficult to calculate constant

  • For fixed T we can sample, states x with low “energy” (low

valuels of f(x)) will be more frequent/likely

  • As T → 0 the distribution will degenerate to states with

minimum energy

slide-41
SLIDE 41

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

slide-42
SLIDE 42

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
slide-43
SLIDE 43

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
  • Large scale (typically discrete) problems
slide-44
SLIDE 44

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
  • Large scale (typically discrete) problems
  • Attempts to find the global optimum in presence of multiple

local optima min x f(x)

slide-45
SLIDE 45

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
  • Large scale (typically discrete) problems
  • Attempts to find the global optimum in presence of multiple

local optima min x f(x)

  • One among many
slide-46
SLIDE 46

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
  • Large scale (typically discrete) problems
  • Attempts to find the global optimum in presence of multiple

local optima min x f(x)

  • One among many stochastic optimisation methods
slide-47
SLIDE 47

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
  • Large scale (typically discrete) problems
  • Attempts to find the global optimum in presence of multiple

local optima min x f(x)

  • One among many stochastic optimisation methods
  • a metaheuristic
slide-48
SLIDE 48

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
  • Large scale (typically discrete) problems
  • Attempts to find the global optimum in presence of multiple

local optima min x f(x)

  • One among many stochastic optimisation methods
  • a metaheuristic
  • Simulated annealing one of the first,
slide-49
SLIDE 49

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
  • Large scale (typically discrete) problems
  • Attempts to find the global optimum in presence of multiple

local optima min x f(x)

  • One among many stochastic optimisation methods
  • a metaheuristic
  • Simulated annealing one of the first, inspired from

Metropolis-Hastings

slide-50
SLIDE 50

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
  • Large scale (typically discrete) problems
  • Attempts to find the global optimum in presence of multiple

local optima min x f(x)

  • One among many stochastic optimisation methods
  • a metaheuristic
  • Simulated annealing one of the first, inspired from

Metropolis-Hastings - Kirkpatrick paper Science 1983

slide-51
SLIDE 51

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
  • Large scale (typically discrete) problems
  • Attempts to find the global optimum in presence of multiple

local optima min x f(x)

  • One among many stochastic optimisation methods
  • a metaheuristic
  • Simulated annealing one of the first, inspired from

Metropolis-Hastings - Kirkpatrick paper Science 1983

  • Alternatives: Stochastic gradient
slide-52
SLIDE 52

02443 – lecture 9 4

DTU

Simulated annealing Simulated annealing

  • Stochastic algorithm for optimisation
  • Large scale (typically discrete) problems
  • Attempts to find the global optimum in presence of multiple

local optima min x f(x)

  • One among many stochastic optimisation methods
  • a metaheuristic
  • Simulated annealing one of the first, inspired from

Metropolis-Hastings - Kirkpatrick paper Science 1983

  • Alternatives: Stochastic gradient and several other
slide-53
SLIDE 53

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

slide-54
SLIDE 54

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

Steel and other materials can exist in several crystalline structures.

slide-55
SLIDE 55

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

Steel and other materials can exist in several crystalline structures. One - the ground state

slide-56
SLIDE 56

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy.

slide-57
SLIDE 57

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states

slide-58
SLIDE 58

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable.

slide-59
SLIDE 59

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable. This is likely to happen when welding, machining, etc.

slide-60
SLIDE 60

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable. This is likely to happen when welding, machining, etc. By heating the material

slide-61
SLIDE 61

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable. This is likely to happen when welding, machining, etc. By heating the material and slowly cooling,

slide-62
SLIDE 62

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable. This is likely to happen when welding, machining, etc. By heating the material and slowly cooling, we ensure that the material ends in the ground state.

slide-63
SLIDE 63

02443 – lecture 9 5

DTU

Physical inspiration (with apologies) Physical inspiration (with apologies)

Steel and other materials can exist in several crystalline structures. One - the ground state - has lowest energy. The material may be “caught” in other states which are only locally stable. This is likely to happen when welding, machining, etc. By heating the material and slowly cooling, we ensure that the material ends in the ground state. This process is called annealing.

slide-64
SLIDE 64

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

slide-65
SLIDE 65

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms).

slide-66
SLIDE 66

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S.

slide-67
SLIDE 67

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics,

slide-68
SLIDE 68

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.

  • f X
slide-69
SLIDE 69

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.

  • f X is the Canonical Distribution
slide-70
SLIDE 70

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.

  • f X is the Canonical Distribution

f(x, T) =

slide-71
SLIDE 71

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.

  • f X is the Canonical Distribution

f(x, T) = cT · exp

  • −U(x)

T

slide-72
SLIDE 72

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.

  • f X is the Canonical Distribution

f(x, T) = cT · exp

  • −U(x)

T

  • So states with low U are more probable;
slide-73
SLIDE 73

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.

  • f X is the Canonical Distribution

f(x, T) = cT · exp

  • −U(x)

T

  • So states with low U are more probable; in particular at low T.
slide-74
SLIDE 74

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.

  • f X is the Canonical Distribution

f(x, T) = cT · exp

  • −U(x)

T

  • So states with low U are more probable; in particular at low T.

Note the normalization constant cT is unknown;

slide-75
SLIDE 75

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.

  • f X is the Canonical Distribution

f(x, T) = cT · exp

  • −U(x)

T

  • So states with low U are more probable; in particular at low T.

Note the normalization constant cT is unknown; can be found by integration,

slide-76
SLIDE 76

02443 – lecture 9 6

DTU

P.d.f. of the state at fixed temperature P.d.f. of the state at fixed temperature

Use X ∈ S to denote the state of the system (e.g., positions of atoms). Let U(x) denote the energy of state x ∈ S. According to statistical physics, if the temperature is T, the p.d.f.

  • f X is the Canonical Distribution

f(x, T) = cT · exp

  • −U(x)

T

  • So states with low U are more probable; in particular at low T.

Note the normalization constant cT is unknown; can be found by integration, but our algorithms will not require it.

slide-77
SLIDE 77

02443 – lecture 9 7

DTU

Example energy potential Example energy potential

0.0 0.2 0.4 0.6 0.8 1.0 −1.0 −0.5 0.0 0.5 1.0 State x Potential U(x)

slide-78
SLIDE 78

02443 – lecture 9 8

DTU

Corresponding p.d.f., for T = 0.2, 1, 5 Corresponding p.d.f., for T = 0.2, 1, 5

0.0 0.2 0.4 0.6 0.8 1.0 2 4 6 8 10 State x p.d.f.

slide-79
SLIDE 79

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

slide-80
SLIDE 80

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time

slide-81
SLIDE 81

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k.

slide-82
SLIDE 82

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state

slide-83
SLIDE 83

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC,

slide-84
SLIDE 84

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f.

slide-85
SLIDE 85

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti).

slide-86
SLIDE 86

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly

slide-87
SLIDE 87

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi.

slide-88
SLIDE 88

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi. If the candidate has lower energy than the old state, accept.

slide-89
SLIDE 89

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi. If the candidate has lower energy than the old state, accept. Otherwise, accept only with probability

slide-90
SLIDE 90

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi. If the candidate has lower energy than the old state, accept. Otherwise, accept only with probability exp(−(U(Yi) − U(Xi))/Ti)

slide-91
SLIDE 91

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi. If the candidate has lower energy than the old state, accept. Otherwise, accept only with probability exp(−(U(Yi) − U(Xi))/Ti) for a symmetric proposal distribution

slide-92
SLIDE 92

An algorithm for Simulated Annealing An algorithm for Simulated Annealing

Let the temperature be a decreasing function of time or iteration number - k. At each time step, update the state according to the random walk Metropolis-Hastings algorithm for MCMC, where the target p.d.f. is f(x, Ti). I.e., permute the state Xi randomly to generate a candidate Yi. If the candidate has lower energy than the old state, accept. Otherwise, accept only with probability exp(−(U(Yi) − U(Xi))/Ti) for a symmetric proposal distribution (to keep the probabilistic interpreation)

slide-93
SLIDE 93

02443 – lecture 9 10

DTU

−1.0 0.0 1.0 0.0 0.2 0.4 0.6 0.8 1.0 U(x) x 2000 4000 6000 8000 10000 Index

slide-94
SLIDE 94

02443 – lecture 9 11

DTU

Different issues Different issues

slide-95
SLIDE 95

02443 – lecture 9 11

DTU

Different issues Different issues

  • Try with different schemes for lowering the temperature
slide-96
SLIDE 96

02443 – lecture 9 11

DTU

Different issues Different issues

  • Try with different schemes for lowering the temperature
  • Alternative initial solutions
slide-97
SLIDE 97

02443 – lecture 9 11

DTU

Different issues Different issues

  • Try with different schemes for lowering the temperature
  • Alternative initial solutions
  • Different candidate generation algorithms
slide-98
SLIDE 98

02443 – lecture 9 11

DTU

Different issues Different issues

  • Try with different schemes for lowering the temperature
  • Alternative initial solutions
  • Different candidate generation algorithms
  • Refine with local search
slide-99
SLIDE 99

02443 – lecture 9 12

DTU

Travelling salesman problem (TSP) Travelling salesman problem (TSP)

slide-100
SLIDE 100

02443 – lecture 9 12

DTU

Travelling salesman problem (TSP) Travelling salesman problem (TSP)

A basic problem in combinatorial optimisation

slide-101
SLIDE 101

02443 – lecture 9 12

DTU

Travelling salesman problem (TSP) Travelling salesman problem (TSP)

A basic problem in combinatorial optimisation Given n stations,

slide-102
SLIDE 102

02443 – lecture 9 12

DTU

Travelling salesman problem (TSP) Travelling salesman problem (TSP)

A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j.

slide-103
SLIDE 103

02443 – lecture 9 12

DTU

Travelling salesman problem (TSP) Travelling salesman problem (TSP)

A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j. Find a route S (a permutation of 1, . . . , n) which

slide-104
SLIDE 104

02443 – lecture 9 12

DTU

Travelling salesman problem (TSP) Travelling salesman problem (TSP)

A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j. Find a route S (a permutation of 1, . . . , n) which

  • starts and ends at station 1, S1 = 1
slide-105
SLIDE 105

02443 – lecture 9 12

DTU

Travelling salesman problem (TSP) Travelling salesman problem (TSP)

A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j. Find a route S (a permutation of 1, . . . , n) which

  • starts and ends at station 1, S1 = 1
  • has minimal total cost
slide-106
SLIDE 106

02443 – lecture 9 12

DTU

Travelling salesman problem (TSP) Travelling salesman problem (TSP)

A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j. Find a route S (a permutation of 1, . . . , n) which

  • starts and ends at station 1, S1 = 1
  • has minimal total cost

n−1

  • i=1
slide-107
SLIDE 107

02443 – lecture 9 12

DTU

Travelling salesman problem (TSP) Travelling salesman problem (TSP)

A basic problem in combinatorial optimisation Given n stations, and an n-by-n matrix A giving the cost of going from station i to j. Find a route S (a permutation of 1, . . . , n) which

  • starts and ends at station 1, S1 = 1
  • has minimal total cost

n−1

  • i=1

A(Si, Si+1)

slide-108
SLIDE 108

02443 – lecture 9 13

DTU

Cost matrix - an example Cost matrix - an example

Town Town to from 1 2 3 4 5 6 1

  • 5

3 1 4 12 2 2

  • 22

11 13 30 3 6 8

  • 13

12 5 4 33 9 5

  • 60

17 5 1 15 6 10

  • 14

6 24 6 8 9 40

slide-109
SLIDE 109

02443 – lecture 9 13

DTU

Cost matrix - an example Cost matrix - an example

Town Town to from 1 2 3 4 5 6 1

  • 5

3 1 4 12 2 2

  • 22

11 13 30 3 6 8

  • 13

12 5 4 33 9 5

  • 60

17 5 1 15 6 10

  • 14

6 24 6 8 9 40

  • Initial solution: {1, 2, 3, 4, 5, 6, 1}
slide-110
SLIDE 110

02443 – lecture 9 13

DTU

Cost matrix - an example Cost matrix - an example

Town Town to from 1 2 3 4 5 6 1

  • 5

3 1 4 12 2 2

  • 22

11 13 30 3 6 8

  • 13

12 5 4 33 9 5

  • 60

17 5 1 15 6 10

  • 14

6 24 6 8 9 40

  • Initial solution: {1, 2, 3, 4, 5, 6, 1} initial cost:

5+22+13+60+14+24 = 138

slide-111
SLIDE 111

02443 – lecture 9 14

DTU

5

13

24 14 60 22 COST = 138

slide-112
SLIDE 112

02443 – lecture 9 15

DTU

24 14 60 COST=120 3 8 11

slide-113
SLIDE 113

Exercise 7 Exercise 7

  • 1. Implement simulated annealing for the travelling salesman. As

proposal, permute two random stations on the route. As cooling scheme, you can use e.g. Tk = 1/ √ 1 + k. or Tk = − log(k + 1), feel free to experiment with different

  • choices. The route must end where it started. Initialise with a

random permutation of stations. (a) Have input be positions in the plane of the n stations. Let the cost of going i → j be the Euclidian distance between station i and j. Plot the resulting route in the plane. Debug with stations on a circle. (b) Then modify your progamme to work with costs directly and apply it to the cost matrix from the course homepage.