Differential Evolution Entirely Parallel Method K.N. Kozlov 1 , A.M. - - PowerPoint PPT Presentation
Differential Evolution Entirely Parallel Method K.N. Kozlov 1 , A.M. - - PowerPoint PPT Presentation
Differential Evolution Entirely Parallel Method K.N. Kozlov 1 , A.M. Samsonov 2 and M.G. Samsonova 1 1 St.Petersburg Polytechnic University, St.Petersburg 195251, Russia 2 A.F. Ioffe Physico-Technical Institute, St.Petersburg, 194021, Russia
Introduction
2 / 30
Motivation: The design of efficient algorithms and systems to solve the inverse problem of mathematical modeling continues to be a challenge. Applications in biology and medicine:
■
processing of large data volumes,
■
determination of experimentally immeasurable quantities,
■
design of new substances,
■
interpretation of modeling results. Universal solution doesn’t exist. The main difficulties are:
■
heterogeneity of biomedical data and parameters,
■
multiple fitness criteria,
■
high computational complexity of biomedical applications, Aims: An efficient optimization method should not only be fast and scalable, but also reliable and robust. 1. Applicability to the different types of models and the formal description of the a priori knowledge, 2. Flexibility: Wide range of strategies to overcome local extrema, 3. Usability: independent of the language of model description.
Formal Problem Statement
3 / 30
Initial value problem: Let the vector v(t, q) = (v0(t, q), ..., vK−1(t, q))T describe the state of the system at time t. The dimension of v equals the number of state variables
- K. Let the vector q = (q0, ..., qI−1)T be the vector of parameters with dimension I.
The system of ordinary differential equations of the first order in respect to the independent variable t and initial conditions describe the dynamics of the system: ∂v ∂t = f(v, q); v(0, q) = v0; (1) The problems of mathematical physics described by the differential equations of higher orders can be rewritten in normal form, like (1)(Lurie, 1993). The model parameters are to be estimated by adaptation or fitting the model
- utput to the experimental data.
Objective Function
4 / 30
Adaptation is performed by minimization of the objective function, e.g. the sum of squared differences between the data and model output,
J
- i=1
(v(ti, q) − y(ti))T (v(ti, q) − y(ti)),where J independent experimental observations are denoted y(ti) = (y0(t), ..., yK−1(t))T and i = 1, ..., J. Multi-objective case: Weighted global criterion method (weighted sum method). Let Ri(X) represent the optimization criteria and Sj(X) = 0 be the parameter constraints. F(q) =
- i
αiRi(q) +
- j
βjSj(q) → min αi > 0 βj > 0
Box Constraints
5 / 30
Constraints in the form of inequalities can be imposed for the subset of parameters: qlow
i
≤ qi ≤ qup
i
i ∈ Il. (2) Lets introduce new parameters ui: qi = αi + βi tanh (ηui), (3) where scaling coefficient η is chosen experimentally, and αi = (qup
i
+ qlow
i
)/2; βi = (qup
i
− qlow
i
)/2. Consequently, parameters qi, i ∈ Il in (1) are substituted with there transformations and adaptation procedure is applied to determine unconstrained parameters ui.
Overview
6 / 30
The constraint optimization problem can be transformed to the unconstrained problem using penalty functions. Global methods:
■
Simulated Annealing (Metropolis method),
■
Genetic algorithms, evolutionary algorithms,
■
Imitation methods: Particle Swarm Optimization, Ant Colonies, Local search:
■
Gradient descent,
■
Conjugate gradient,
■
Nelder-Mead method, For more overview see (Ashyraliyev et al., 2009; Mendes and Kell, 1998; Moles et al., 2003).
Existing Solutions
7 / 30
Methods for solution of the inverse problem of mathematical modeling are actively developed. Available tools and approaches:
■
Matlab, Mathematica, Maple, etc,
■
Octave, R, COPASI, KNIME, etc,
■
Programming «from scratch»,
■
Adaptation «by hand». Disadvantages:
■
Integrated systems: poor interoperability, oriented to simulations,
■
Limitations to certain types of models,
■
Steep learning curve,
■
Complexity of the a priori knowledge formalization.
Differential Evolution
8 / 30
Stochastic iterative optimization technique, suggested by Storn and Price (Storn and Price, 1995). Geometric interpretation in 2D case:
q1
2
q
x x x x x x x x x x x x x x
- qi
qr3
r2
qr1 v q Minimum NP parameter vectors from current generation
The set random parameter vectors qi, i = 1, ..., NP. The size of the set NP is fixed. Construction of trial vectors: v = qr1 + S(qr2 − qr3), where S is the scaling constant. «trigonometric mutation rule» (Fan and Lampinen, 2003): z = qr1 + qr2 + qr3 3 +(ϕ2 − ϕ1)(qr1 − qr2) +(ϕ3 − ϕ2)(qr2 − qr3) +(ϕ1 − ϕ3)(qr3 − qr1) where ϕi = |F(qri)|/ϕ∗, i = 1, 2, 3, ϕ∗ = |F(qr1)| + |F(qr2)| + |F(qr3)|
Crossover
9 / 30
q2 q3 q4
5
q q6 q7 q0 q1 q8 q2 q3 q4
5
q q6 q7 q0 q1 q8 q2 q3 q4
5
q q6 q7 q0 q1 q8 v w
n = 2 n = 3 n = 4
z
The offspring is defined as follows: wj =
- vj,
j = nI, n + 1I, ..., n + L − 1I zj j < nI OR j > n + L − 1I where n is the randomly chosen index, nI is the reminder of division of n by I, L is determined by Pr(L = a) = (p)a, where p is the crossover probability.
Differential Evolution Entirely Parallel Method
10 / 30
Being the evolutionary algorithm, DE can be effectively parallelized:
■
each member of the population is pushed to the asynchronous queue.
■
it is evaluated individually as soon as there is an available thread in the pool.
■
threads are synchronized on each generation. Age of the individual – the number of iterations the individual survived without changes. Several oldest members are substituted by the same number of individuals, having the best values of objective function. The parameters of the algorithm:
- Ψ is the number of seeding individuals (i.e. individuals that substitute old ones),
- Θ is the number of iterations called seeding interval.
Selection of Offsprings
11 / 30
proc select (individual) =
{ if (F < the value of the parent) then
Accept offspring else for all additional functions P do if (P < the value of the parent) then Generate the random number U. if (U < control parameter for this function) then Accept offspring end if end if end for end if }
New Selection Rule for several
- bjective functions:
1. The offspring replaces its parent if the value for the offspring set
- f parameters is less than that for
the parental one. 2. The offspring replaces its parent if the value of some objective function is better and the randomly selected value is less than the predefined parameter for this function.
Population Diversity
12 / 30
The original algorithm was highly dependent on internal parameters, see, e.g. (Gaemperle R., 2002). The variance of each parameter: varj = 1 NP
NP −1
- i=0
- qi,j −
1 NP
NP −1
- k=0
qk,j 2 (4) where j = 0, ..., I − 1. An efficient Adaptive Scheme for selection of internal parameters S and p based on the Preserving Population Diversity was proposed in (Zaharie, 2002).
Adaptive Scheme
13 / 30
Then Sj =
- NP ·(cj−1)+pj(2−pj)
2·NP ·pj
NP · (cj − 1) + pj(2 − pj) ≥ 0 Sinf NP · (cj − 1) + pj(2 − pj) < 0 (5) and pj =
- −(NP · S2
j − 1) +
- (NP · S2
j − 1)2 − NP · (1 − cj)
cj ≥ 1 pinf cj < 1 (6) New control parameter γ: cnew
j
= γ
- varj/varnew
j
- (7)
Stopping criterion
14 / 30
M
ρ
F
max
- pt
F N
Calculations are stopped in case that the objective function F decreases less than a predefined value ρ during M steps.
The integer valued parameters
15 / 30
■
The values are sorted in ascending order.
■
The index of the parameter in the floating point array becomes the value of the parameter in the integer array. a[0] = 0.3; a[1] = 0.5; a[2] = 0.1; a[3] = 0.8; After sorting in the ascending order the array should be transformed to: b[0] = a[2] = 0.1 b[2] = a[1] = 0.5 b[1] = a[0] = 0.3 b[3] = a[3] = 0.8 The indices of the sorted array define the current set of the integer valued parameters: i[0] = 2; i[1] = 0; i[2] = 1; i[3] = 3;
Approximation of the Optimum
16 / 30
1 2 3 4 5 6 4.6 4.8 5 5.2 5.4 5.6 5.8 6 Error Parameter Value Criterion 1 Criterion 2 Criterion 3 0.5 1 1.5 2 2.5 3 3.5 4 4.5 2.4 2.6 2.8 3 3.2 3.4 3.6 Error Parameter Value Criterion 1 Criterion 2 Criterion 3
x=5.0 y=3.0
1 2 3 4 5 6 1 1.05 1.1 1.15 1.2 1.25 1.3 Error Parameter Value Criterion 1 Criterion 2 Criterion 3 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0.36 0.38 0.4 0.42 0.44 0.46 0.48 Error Parameter Value Criterion 1 Criterion 2 Criterion 3
z=1.12 q=0.43
The plot of the criteria in the close vicinity of the
- ptimal values of the four
parameters. F1 is the sum of squared differences between the model output and data, F2 and F3 penalize the deviation of the model from the desired behavior. While the criteria do not take a minimum value at
- ne and the same point,
nevertheless, the algorithm produces reliable approximation of the
- ptimum.
Implementation
17 / 30
Software framework for solution of inverse problem of mathematical modeling was implemented in C programming language. The package architecture: Main Characteristics and Benefits:
■
New flexible, reliable and effective methods,
■
The objective function can be formulated using different computer languages widely used in biomedical applications,
■
Applicable to finding float and integer parameters with one or several criteria. The package provides simple command line interface and can be embedded in any new software. Project site: http://sourceforge.net/projects/deepmethod and
http://urchin.spbcas.ru/trac/DEEP
Usage
18 / 30
The control parameters of the algorithm are defined in the datafile that uses an INI-format. dpdeepctl [OPTION...] or dpdeepctl_mpi [OPTION...]
- -default-name or
- -model-file
- -model-group
- -settings-file
- -settings-group
- -target-file
- -target-group
- -output-file
- -operation
- -monitor
Command line options:
■
The model,
■
The constants,
■
The objectives. dpdeepctl --default-name=model.ini
Definition of Objective Function
19 / 30
The deviation of the model solution from the desired behavior. Common types are:
■
Dynamical model, differential equations – the sum of squared differences between model output and data,
■
Regression model – properties of the observation,
■
Black box – the value of the discrepancy, the risk.
■
Constraints in the form of inequalities → equivalent equalities,
■
Penalty functions,
■
Known parameters are fixed. The programm, that: Input: Variable parameters; Output: Numerical values of all
- bjectives;
Model description
20 / 30
parts=name;number;... – Names and number of parameters. mask=0 OR 1;... – May be used in convertion algorithms, one digit for each position in parts. parms=number;... – Integer parameters. dparms=number;... – Float parameters. tweak=0 OR 1;... – Keys: 0 means that the parameter is fixed, 1 means that the parameter can be adjusted. lbound=number;... – Low limit for box constraints. hbound=number;... – Upper limit for box constraints. limited=number;... – 1 means that box constraints apply, 0 means that the parameter is unlimited. scale=number;... – The scaling for each parameter in unlimited case. command=string – Command line for objective function calculation. convert=octave OR sdf OR subset OR subsubset OR gemstat OR gcdm
Subprogram Output Parser
21 / 30
delimiters=string keys=number;... mapping=number;... ZZ = 2.8352e+04 tspen = 2980.0 vpen = 1.9583 keys[0] ="ZZ" keys[2] ="2.8352e+04" keys[3] ="tspen" keys[4] ="2980.0" keys[5] ="vpen" keys[6] ="1.9583" mapping[0]=keys[2] mapping[1]=keys[4] mapping[2]=keys[6]
Conversion
22 / 30
- ctave: < variable >=< value >;
Used in GNU Octave – http://www.gnu.org/software/octave/, R – http://www.r-project.org/ sdf: Molecule structure. Used in OpenBabel – http://openbabel.org and for QSAR – Quantitative Structure-Activity Relationship. subset and subsubset. Used for the sequence of indices.
Objective Function
23 / 30
- -target-file=octave-t.ini --target-group=target
... [default_target] mainfunc=scoredouble;0;1;1; func1=readpenalty;1;0.45;0.1; func2=readpenalty;2;0.45;0.1; ... Function name:
■
scoredouble – function of float argument,
■
readpenalty – additional criterion,
■
doubletoint – creation of integer parameters from float ones,
■
scoreint – function of integer argument. Constants:
■
The index,
■
The rank,
■
The weight in the global criterion,
Stopping Criterion
24 / 30
... proportional_stop=1e-8 stop_count=5 ... Function name:
■
proportional_stop – limits the fraction of the difference between the values on two consecutive steps and the largest value,
■
absolute_stop – limits the difference between the values on two consecutive steps,
■
absolute_score – limits the value,
■
absolute_iter – limits the number of steps,
■
absolute_time – limits the execution time. Constants:
■
Number of successful cases.
Transformation of Constraints
25 / 30
... transform=tanh gamma_init=1.0 roundoff_error=1e-12 ... Function name:
■
tanh – sigmoidal transformation function,
■
sin – periodic transformation function,
■
alg – restricted interval,
■
rand – random substitution. Constants:
■
Transformation function parameter,
■
Border offset.
Population Control
26 / 30
tau=10 population_size=169 recombination_strategy=de_3_bin_T recombination_weight=0 recombination_prob=0 recombination_gamma=0.9 es_lambda=2 noglobal_eps=0 ... Function name:
■
de_3_bin_T,
■
de_3_bin,
■
de_3_exp_T,
■
de_3_exp. Constants:
■
population size,
■
population diversity control parameter γ,
■
the neighborhood in the parameter space – ε (0 – the whole space),
■
number of seeding individuals Ψ.
Procedure Lists
27 / 30
run_before=initstop;optpost;optposteval;writelog; run=gdeep;1;edeep;1;sdeep;1;dpupdate;1;substitute;5;checkstop;1;optpost;1 run_after=optpost;optposteval;writelog; ...
■
run_before – executed once befor the main loop,
■
run – executed on each defined iteration,
■
run_after – executed once after the main loop,
■
initstop – sets the stopping criterion,
■
- ptpost, optposteval – best