COSC 3P71 Particle Swarm Optimization (PSO) Brock University - - PowerPoint PPT Presentation

▶

Jul 28, 2023 909 likes •1.16k views

COSC 3P71 Particle Swarm Optimization (PSO) Brock University Brock University ((PSO)) Particle Swarm Optimization 1 / 23 Swarm Intelligence Recall Swarm Intelligence in general, and Ant Colony Optimization in specific. What do we remember?

SLIDE 1

COSC 3P71

Particle Swarm Optimization (PSO)

Brock University

Brock University ((PSO)) Particle Swarm Optimization 1 / 23

SLIDE 2

Swarm Intelligence

Recall Swarm Intelligence in general, and Ant Colony Optimization in specific. What do we remember? Biologically inspired (yawn) etc

Brock University ((PSO)) Particle Swarm Optimization 2 / 23

SLIDE 3

Particle Swarm Optimization

(PSO)

Inspired by flocking birds and schooling fish Develops multiple solutions in parallel Provides both elements of independent exploration and social cooperation/collaboration

Brock University ((PSO)) Particle Swarm Optimization 3 / 23

SLIDE 4

Particle Swarm Optimization

So... ?

More practically, it’s an optimization algorithm that seeks to find a vector of floating point values that best solves some N-dimensional target function

Brock University ((PSO)) Particle Swarm Optimization 4 / 23

SLIDE 5

The Swarm

swarm consists of multiple particles flying independently through search space each particle acts as a separate very simple agent we’ll better define a ‘particle’ in just a bit

Brock University ((PSO)) Particle Swarm Optimization 5 / 23

SLIDE 6

particles...

find new solutions that are ‘similar’ to their current solutions at the moment usually have a tendency to stay in motion search for solutions with awareness of past personal success search for solutions with consideration of progress made by the collective etc.

Brock University ((PSO)) Particle Swarm Optimization 6 / 23

SLIDE 7

so... what IS a particle?

each has a position: single solution; analogous to chromosome velocity: tendency to change position (and thus solution) at each step/increment neighbourhood: particles with which a particle collaborates

Brock University ((PSO)) Particle Swarm Optimization 7 / 23

SLIDE 8

searching

Updating the position (and thus solution) is trivial:

x′ =

x + v intelligence normally comes from velocity update rule change the manner by which the velocity is updated, and you change the means of training the position (candidate solution)!

Brock University ((PSO)) Particle Swarm Optimization 8 / 23

SLIDE 9

initialization

All positions and velocities are randomized (per dimension) Positions may be chosen anywhere within the expected bounds of the search space

◮ We’ll talk a bit more about these bounds later

Choosing velocity bounds is trickier, and mostly isn’t standardized

◮ I’m partial to half the bounds of the search space (or less) ◮ Obviously the magnitude of the velocity bounds is strictly positive, but

the selectable initial velocity components should be permitted to be negative

Brock University ((PSO)) Particle Swarm Optimization 9 / 23

SLIDE 10

Conventional Velocity Update Rule

v′ = ω

v + c1r1( xb − x) + c2r2( xgb − x)[+c3 r] The +c3 r is optional The rs are at least generated per particle, per iteration

◮ Optionally, they may be per dimension, per particle, per iteration ◮ i.e.

r

◮ Values typically range from 0 to 1 Brock University ((PSO)) Particle Swarm Optimization 10 / 23

SLIDE 11

Explanation of Terms

ω v: Inertia (or momentum) c1( xb − x): Cognitive component — tendency to drift back towards ‘past personal glory’ c2( xgb − x): Social component — tendency to drift towards global or neighbourhood best thus far c3 r: Explorative/random — to discourage stagnation

Brock University ((PSO)) Particle Swarm Optimization 11 / 23

SLIDE 12

Selection of Parameters

General decent initial guesses

ω — less than 1 (obviously) c1 and c1 — varies, but 2 and 2 isn’t unheard of Of course, all are normally determined empirically, and are normally static, but can be dynamic, or may be trained by another algorithm

Brock University ((PSO)) Particle Swarm Optimization 12 / 23

SLIDE 13

Neighbourhoods

Remember that the social component dictates the likelihood that a particle will rely on the work of its peers — those other particles within its neighbourhood. The simplest neighbourhood is the entire swarm

◮ That might inhibit exploration, and cause premature convergence

Alternatively, you can choose a neighbourhood size and a mechanism for choosing neighbours

◮ The initial question is whether you should choose the neighbourhood

nce, at the beginning of the algorithm, or if you should decide each

iteration

⋆ If decided per-iteration, it will typically be based on Euclidean proximity ⋆ It’s up to you whether swarm associations are symmetric or not — e.g.

whether pA ∈ sB = ⇒ pB ∈ sA

⋆ When you choose your neighbourhoods probably won’t really matter,

because being assigned to the same neighbourhood a priori will tend to also encourage proximity

Brock University ((PSO)) Particle Swarm Optimization 13 / 23

SLIDE 14

Bounds and Restrictions

Solutions often have upper/lower bounds When this is true, it must be enforced on the positions Particles can simply be unable to exceed the bounds, may wrap to the

ther side, or may bounce off

Note that, when such restrictions don’t exist, there may be some risk

f the particles just flying away, never to be seen again

As mentioned earlier, velocities should also be “clamped” to some maximum magnitude.

Brock University ((PSO)) Particle Swarm Optimization 14 / 23

SLIDE 15

Limitations of Canonical PSO

First and foremost, PSO is not suitable for problems with “holes” in the search space

◮ e.g. x between 0 and 1, or between 2 and 3, but not betwen 1 and 2 ⋆ Of course, for a problem like this, one could simply adapt the

transcription

⋆ e.g. the aforementioned range could be mapped to a [0..2]; continuous

within the particle’s space, but discontinuous when evaluated for fitness

Not suitable for problems with highly constrained choices

◮ e.g. What’s legal in dimension 1 depends on what was chosen in

dimension 0

◮ This can be particularly problematic if trying to adapt to a

combinatorial problem

Not really appropriate when adjacent values in the search space aren’t “similar” in the solution space

Brock University ((PSO)) Particle Swarm Optimization 15 / 23

SLIDE 16

Limitations of Canonical PSO

One final thought...

For problems/functions that can scale into larger versions, dimensionality might eventually become a limiting factor. Remember that our fitness function will normally just give us a single value Declaring one particle’s 300-dimensional position better than another particle’s 300-dimensional position might not ascribe much significance to each individual dimension

Brock University ((PSO)) Particle Swarm Optimization 16 / 23

SLIDE 17

Benefits

PSOs are easy to code and (outside of possibly the fitness function) very fast to execute The lack of a need for differentiability

◮ Unlike, for example, gradient descent!

PSOs are easy to code and fast to execute There are relatively few parameters to choose

◮ Consequently, there may be less bias from the user/experimenter

PSOs are easy to code and fast to execute

Brock University ((PSO)) Particle Swarm Optimization 17 / 23

SLIDE 18

Applications

The most common application is function optimization Minimization/maximization is an example It can also include things like training weights for ANNs

◮ As such, it can be a viable alternative to BackProp Brock University ((PSO)) Particle Swarm Optimization 18 / 23

SLIDE 19

Applications

Hmmm...

Could we apply this to a combinatorial problem without changing the continuous nature of the pso itself? Could we apply it to something like TSP? Could we apply it to a highly-constrained problem, like, say, two-connected networks with bounded rings?

Brock University ((PSO)) Particle Swarm Optimization 19 / 23

SLIDE 20

Variations

Modifications to inertia

◮ The easiest is to start momentum high, and then gradually it reduce

ver time

⋆ Compare to simulated annealing ⋆ It might help to avoid local minima, and then allow for refinement ◮ Personally, I’m partial to oscillation

Binary or combinatorial PSO

◮ If velocity is the propensity to change position, then one could relax the

definitions to include combinatorial versions

◮ Asterisk*

I even saw one paper that used special operators like crossover and mutation...

◮ ...wait a minute...

Anyhoo, in general, outside of screwy representations and such, much as the ‘intelligence’ comes from the velocity rule, that’s also where it’s easiest to introduce novel variations

Brock University ((PSO)) Particle Swarm Optimization 20 / 23

SLIDE 21

Example

Minimization

Let’s take a look at an example! Suppose we want to minimize the function: f (x, y) = cos(6.28×(3x+2y))+cos(6.28×(2x+3y))−2×sin(6.28×(x+y)) within the range of [−0.4..0.4] Just... because of reasons. Okay?

Brock University ((PSO)) Particle Swarm Optimization 21 / 23

SLIDE 22

Example

Additional samples

You can find some additional interesting test functions here:

http://www.sfu.ca/~ssurjano/optimization.html

Particularly neat ones are: Ackley Function Schaffer Function Eggholder Function Cross-in-Tray Function Langermann Function Rastrigin Function Also check out the Hartmann functions listed. They go up to 6D! That’s 4 more Ds!

Brock University ((PSO)) Particle Swarm Optimization 22 / 23

COSC 3P71

Particle Swarm Optimization (PSO)

Brock University

Swarm Intelligence

Recall Swarm Intelligence in general, and Ant Colony Optimization in specific. What do we remember? Biologically inspired (yawn) etc

Particle Swarm Optimization

(PSO)

Inspired by flocking birds and schooling fish Develops multiple solutions in parallel Provides both elements of independent exploration and social cooperation/collaboration

Particle Swarm Optimization

So... ?

More practically, it’s an optimization algorithm that seeks to find a vector of floating point values that best solves some N-dimensional target function

The Swarm

swarm consists of multiple particles flying independently through search space each particle acts as a separate very simple agent we’ll better define a ‘particle’ in just a bit

particles...

find new solutions that are ‘similar’ to their current solutions at the moment usually have a tendency to stay in motion search for solutions with awareness of past personal success search for solutions with consideration of progress made by the collective etc.

so... what IS a particle?

each has a position: single solution; analogous to chromosome velocity: tendency to change position (and thus solution) at each step/increment neighbourhood: particles with which a particle collaborates

searching

Updating the position (and thus solution) is trivial:

x + v intelligence normally comes from velocity update rule change the manner by which the velocity is updated, and you change the means of training the position (candidate solution)!

initialization

All positions and velocities are randomized (per dimension) Positions may be chosen anywhere within the expected bounds of the search space

Choosing velocity bounds is trickier, and mostly isn’t standardized

the selectable initial velocity components should be permitted to be negative

Conventional Velocity Update Rule

v + c1r1( xb − x) + c2r2( xgb − x)[+c3 r] The +c3 r is optional The rs are at least generated per particle, per iteration

r

Explanation of Terms

ω v: Inertia (or momentum) c1( xb − x): Cognitive component — tendency to drift back towards ‘past personal glory’ c2( xgb − x): Social component — tendency to drift towards global or neighbourhood best thus far c3 r: Explorative/random — to discourage stagnation

Selection of Parameters

General decent initial guesses

ω — less than 1 (obviously) c1 and c1 — varies, but 2 and 2 isn’t unheard of Of course, all are normally determined empirically, and are normally static, but can be dynamic, or may be trained by another algorithm

Neighbourhoods

Remember that the social component dictates the likelihood that a particle will rely on the work of its peers — those other particles within its neighbourhood. The simplest neighbourhood is the entire swarm

Alternatively, you can choose a neighbourhood size and a mechanism for choosing neighbours

iteration

whether pA ∈ sB = ⇒ pB ∈ sA

because being assigned to the same neighbourhood a priori will tend to also encourage proximity

Bounds and Restrictions

Solutions often have upper/lower bounds When this is true, it must be enforced on the positions Particles can simply be unable to exceed the bounds, may wrap to the

Note that, when such restrictions don’t exist, there may be some risk

As mentioned earlier, velocities should also be “clamped” to some maximum magnitude.

Limitations of Canonical PSO

First and foremost, PSO is not suitable for problems with “holes” in the search space

transcription

within the particle’s space, but discontinuous when evaluated for fitness

Not suitable for problems with highly constrained choices

dimension 0

combinatorial problem

Not really appropriate when adjacent values in the search space aren’t “similar” in the solution space

Limitations of Canonical PSO

One final thought...

Benefits

PSOs are easy to code and (outside of possibly the fitness function) very fast to execute The lack of a need for differentiability

PSOs are easy to code and fast to execute There are relatively few parameters to choose

PSOs are easy to code and fast to execute

Applications

The most common application is function optimization Minimization/maximization is an example It can also include things like training weights for ANNs

Applications

Hmmm...

Could we apply this to a combinatorial problem without changing the continuous nature of the pso itself? Could we apply it to something like TSP? Could we apply it to a highly-constrained problem, like, say, two-connected networks with bounded rings?

Variations

Modifications to inertia

Binary or combinatorial PSO

definitions to include combinatorial versions

I even saw one paper that used special operators like crossover and mutation...

Anyhoo, in general, outside of screwy representations and such, much as the ‘intelligence’ comes from the velocity rule, that’s also where it’s easiest to introduce novel variations

Example

Minimization

Let’s take a look at an example! Suppose we want to minimize the function: f (x, y) = cos(6.28×(3x+2y))+cos(6.28×(2x+3y))−2×sin(6.28×(x+y)) within the range of [−0.4..0.4] Just... because of reasons. Okay?

Example

Additional samples

You can find some additional interesting test functions here:

http://www.sfu.ca/~ssurjano/optimization.html

Particularly neat ones are: Ackley Function Schaffer Function Eggholder Function Cross-in-Tray Function Langermann Function Rastrigin Function Also check out the Hartmann functions listed. They go up to 6D! That’s 4 more Ds!

Questions?

Comments?

Catchy tunes?