[PPT] - Differential Evolution Entirely Parallel Method K.N. Kozlov 1 , A.M. PowerPoint Presentation

SLIDE 1

Differential Evolution Entirely Parallel Method

K.N. Kozlov∗1, A.M. Samsonov2 and M.G. Samsonova1

1St.Petersburg Polytechnic University, St.Petersburg 195251, Russia 2A.F. Ioffe Physico-Technical Institute, St.Petersburg, 194021, Russia ∗email:kozlov_kn@spbstu.ru

SLIDE 2

Introduction

2 / 30

Motivation: The design of efficient algorithms and systems to solve the inverse problem of mathematical modeling continues to be a challenge. Applications in biology and medicine:

■

processing of large data volumes,

■

determination of experimentally immeasurable quantities,

■

design of new substances,

■

interpretation of modeling results. Universal solution doesn’t exist. The main difficulties are:

■

heterogeneity of biomedical data and parameters,

■

multiple fitness criteria,

■

high computational complexity of biomedical applications, Aims: An efficient optimization method should not only be fast and scalable, but also reliable and robust. 1. Applicability to the different types of models and the formal description of the a priori knowledge, 2. Flexibility: Wide range of strategies to overcome local extrema, 3. Usability: independent of the language of model description.

SLIDE 3

Formal Problem Statement

3 / 30

Initial value problem: Let the vector v(t, q) = (v0(t, q), ..., vK−1(t, q))T describe the state of the system at time t. The dimension of v equals the number of state variables

K. Let the vector q = (q0, ..., qI−1)T be the vector of parameters with dimension I.

The system of ordinary differential equations of the first order in respect to the independent variable t and initial conditions describe the dynamics of the system: ∂v ∂t = f(v, q); v(0, q) = v0; (1) The problems of mathematical physics described by the differential equations of higher orders can be rewritten in normal form, like (1)(Lurie, 1993). The model parameters are to be estimated by adaptation or fitting the model

utput to the experimental data.

SLIDE 4

Objective Function

4 / 30

Adaptation is performed by minimization of the objective function, e.g. the sum of squared differences between the data and model output,

J

i=1

(v(ti, q) − y(ti))T (v(ti, q) − y(ti)),where J independent experimental observations are denoted y(ti) = (y0(t), ..., yK−1(t))T and i = 1, ..., J. Multi-objective case: Weighted global criterion method (weighted sum method). Let Ri(X) represent the optimization criteria and Sj(X) = 0 be the parameter constraints. F(q) =

i

αiRi(q) +

j

βjSj(q) → min αi > 0 βj > 0

SLIDE 5

Box Constraints

5 / 30

Constraints in the form of inequalities can be imposed for the subset of parameters: qlow

i

≤ qi ≤ qup

i

i ∈ Il. (2) Lets introduce new parameters ui: qi = αi + βi tanh (ηui), (3) where scaling coefficient η is chosen experimentally, and αi = (qup

i

+ qlow

i

)/2; βi = (qup

i

− qlow

i

)/2. Consequently, parameters qi, i ∈ Il in (1) are substituted with there transformations and adaptation procedure is applied to determine unconstrained parameters ui.

SLIDE 6

Overview

6 / 30

The constraint optimization problem can be transformed to the unconstrained problem using penalty functions. Global methods:

■

Simulated Annealing (Metropolis method),

■

Genetic algorithms, evolutionary algorithms,

■

Imitation methods: Particle Swarm Optimization, Ant Colonies, Local search:

■

Gradient descent,

■

Conjugate gradient,

■

Nelder-Mead method, For more overview see (Ashyraliyev et al., 2009; Mendes and Kell, 1998; Moles et al., 2003).

SLIDE 7

Existing Solutions

7 / 30

Methods for solution of the inverse problem of mathematical modeling are actively developed. Available tools and approaches:

■

Matlab, Mathematica, Maple, etc,

■

Octave, R, COPASI, KNIME, etc,

■

Programming «from scratch»,

■

Adaptation «by hand». Disadvantages:

■

Integrated systems: poor interoperability, oriented to simulations,

■

Limitations to certain types of models,

■

Steep learning curve,

■

Complexity of the a priori knowledge formalization.

SLIDE 8

Differential Evolution

8 / 30

Stochastic iterative optimization technique, suggested by Storn and Price (Storn and Price, 1995). Geometric interpretation in 2D case:

q1

2

q

x x x x x x x x x x x x x x

qi

qr3

r2

qr1 v q Minimum NP parameter vectors from current generation

The set random parameter vectors qi, i = 1, ..., NP. The size of the set NP is fixed. Construction of trial vectors: v = qr1 + S(qr2 − qr3), where S is the scaling constant. «trigonometric mutation rule» (Fan and Lampinen, 2003): z = qr1 + qr2 + qr3 3 +(ϕ2 − ϕ1)(qr1 − qr2) +(ϕ3 − ϕ2)(qr2 − qr3) +(ϕ1 − ϕ3)(qr3 − qr1) where ϕi = |F(qri)|/ϕ∗, i = 1, 2, 3, ϕ∗ = |F(qr1)| + |F(qr2)| + |F(qr3)|

SLIDE 9

Crossover

9 / 30

q2 q3 q4

5

q q6 q7 q0 q1 q8 q2 q3 q4

5

q q6 q7 q0 q1 q8 q2 q3 q4

5

q q6 q7 q0 q1 q8 v w

n = 2 n = 3 n = 4

z

The offspring is defined as follows: wj =

vj,

j = nI, n + 1I, ..., n + L − 1I zj j < nI OR j > n + L − 1I where n is the randomly chosen index, nI is the reminder of division of n by I, L is determined by Pr(L = a) = (p)a, where p is the crossover probability.

SLIDE 10

Differential Evolution Entirely Parallel Method

10 / 30

Being the evolutionary algorithm, DE can be effectively parallelized:

■

each member of the population is pushed to the asynchronous queue.

■

it is evaluated individually as soon as there is an available thread in the pool.

■

threads are synchronized on each generation. Age of the individual – the number of iterations the individual survived without changes. Several oldest members are substituted by the same number of individuals, having the best values of objective function. The parameters of the algorithm:

Ψ is the number of seeding individuals (i.e. individuals that substitute old ones),
Θ is the number of iterations called seeding interval.

SLIDE 11

Selection of Offsprings

11 / 30

proc select (individual) =

{ if (F < the value of the parent) then

Accept offspring else for all additional functions P do if (P < the value of the parent) then Generate the random number U. if (U < control parameter for this function) then Accept offspring end if end if end for end if }

New Selection Rule for several

bjective functions:

1. The offspring replaces its parent if the value for the offspring set

f parameters is less than that for

the parental one. 2. The offspring replaces its parent if the value of some objective function is better and the randomly selected value is less than the predefined parameter for this function.

SLIDE 12

Population Diversity

12 / 30

The original algorithm was highly dependent on internal parameters, see, e.g. (Gaemperle R., 2002). The variance of each parameter: varj = 1 NP

NP −1

i=0
qi,j −

1 NP

NP −1

k=0

qk,j 2 (4) where j = 0, ..., I − 1. An efficient Adaptive Scheme for selection of internal parameters S and p based on the Preserving Population Diversity was proposed in (Zaharie, 2002).

SLIDE 13

Adaptive Scheme

13 / 30

Then Sj =

NP ·(cj−1)+pj(2−pj)

2·NP ·pj

NP · (cj − 1) + pj(2 − pj) ≥ 0 Sinf NP · (cj − 1) + pj(2 − pj) < 0 (5) and pj =

−(NP · S2

j − 1) +

(NP · S2

j − 1)2 − NP · (1 − cj)

cj ≥ 1 pinf cj < 1 (6) New control parameter γ: cnew

j

= γ

varj/varnew

j

(7)

SLIDE 14

Stopping criterion

14 / 30

M

ρ

F

max

pt

F N

Calculations are stopped in case that the objective function F decreases less than a predefined value ρ during M steps.

SLIDE 15

The integer valued parameters

15 / 30

■

The values are sorted in ascending order.

■

The index of the parameter in the floating point array becomes the value of the parameter in the integer array. a[0] = 0.3; a[1] = 0.5; a[2] = 0.1; a[3] = 0.8; After sorting in the ascending order the array should be transformed to: b[0] = a[2] = 0.1 b[2] = a[1] = 0.5 b[1] = a[0] = 0.3 b[3] = a[3] = 0.8 The indices of the sorted array define the current set of the integer valued parameters: i[0] = 2; i[1] = 0; i[2] = 1; i[3] = 3;

SLIDE 16

Approximation of the Optimum

16 / 30

1 2 3 4 5 6 4.6 4.8 5 5.2 5.4 5.6 5.8 6 Error Parameter Value Criterion 1 Criterion 2 Criterion 3 0.5 1 1.5 2 2.5 3 3.5 4 4.5 2.4 2.6 2.8 3 3.2 3.4 3.6 Error Parameter Value Criterion 1 Criterion 2 Criterion 3

x=5.0 y=3.0

1 2 3 4 5 6 1 1.05 1.1 1.15 1.2 1.25 1.3 Error Parameter Value Criterion 1 Criterion 2 Criterion 3 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0.36 0.38 0.4 0.42 0.44 0.46 0.48 Error Parameter Value Criterion 1 Criterion 2 Criterion 3

z=1.12 q=0.43

The plot of the criteria in the close vicinity of the

ptimal values of the four

parameters. F1 is the sum of squared differences between the model output and data, F2 and F3 penalize the deviation of the model from the desired behavior. While the criteria do not take a minimum value at

ne and the same point,

nevertheless, the algorithm produces reliable approximation of the

ptimum.

SLIDE 17

Implementation

17 / 30

Software framework for solution of inverse problem of mathematical modeling was implemented in C programming language. The package architecture: Main Characteristics and Benefits:

■

New flexible, reliable and effective methods,

■

The objective function can be formulated using different computer languages widely used in biomedical applications,

■

Applicable to finding float and integer parameters with one or several criteria. The package provides simple command line interface and can be embedded in any new software. Project site: http://sourceforge.net/projects/deepmethod and

http://urchin.spbcas.ru/trac/DEEP

SLIDE 18

Usage

18 / 30

The control parameters of the algorithm are defined in the datafile that uses an INI-format. dpdeepctl [OPTION...] or dpdeepctl_mpi [OPTION...]

-default-name or
-model-file
-model-group
-settings-file
-settings-group
-target-file
-target-group
-output-file
-operation
-monitor

Command line options:

■

The model,

■

The constants,

■

The objectives. dpdeepctl --default-name=model.ini

SLIDE 19

Definition of Objective Function

19 / 30

The deviation of the model solution from the desired behavior. Common types are:

■

Dynamical model, differential equations – the sum of squared differences between model output and data,

■

Regression model – properties of the observation,

■

Black box – the value of the discrepancy, the risk.

■

Constraints in the form of inequalities → equivalent equalities,

■

Penalty functions,

■

Known parameters are fixed. The programm, that: Input: Variable parameters; Output: Numerical values of all

bjectives;

SLIDE 20

Model description

20 / 30

parts=name;number;... – Names and number of parameters. mask=0 OR 1;... – May be used in convertion algorithms, one digit for each position in parts. parms=number;... – Integer parameters. dparms=number;... – Float parameters. tweak=0 OR 1;... – Keys: 0 means that the parameter is fixed, 1 means that the parameter can be adjusted. lbound=number;... – Low limit for box constraints. hbound=number;... – Upper limit for box constraints. limited=number;... – 1 means that box constraints apply, 0 means that the parameter is unlimited. scale=number;... – The scaling for each parameter in unlimited case. command=string – Command line for objective function calculation. convert=octave OR sdf OR subset OR subsubset OR gemstat OR gcdm

SLIDE 21

Subprogram Output Parser

21 / 30

delimiters=string keys=number;... mapping=number;... ZZ = 2.8352e+04 tspen = 2980.0 vpen = 1.9583 keys[0] ="ZZ" keys[2] ="2.8352e+04" keys[3] ="tspen" keys[4] ="2980.0" keys[5] ="vpen" keys[6] ="1.9583" mapping[0]=keys[2] mapping[1]=keys[4] mapping[2]=keys[6]

SLIDE 22

Conversion

22 / 30

ctave: < variable >=< value >;

Used in GNU Octave – http://www.gnu.org/software/octave/, R – http://www.r-project.org/ sdf: Molecule structure. Used in OpenBabel – http://openbabel.org and for QSAR – Quantitative Structure-Activity Relationship. subset and subsubset. Used for the sequence of indices.

SLIDE 23

Objective Function

23 / 30

-target-file=octave-t.ini --target-group=target

... [default_target] mainfunc=scoredouble;0;1;1; func1=readpenalty;1;0.45;0.1; func2=readpenalty;2;0.45;0.1; ... Function name:

■

scoredouble – function of float argument,

■

readpenalty – additional criterion,

■

doubletoint – creation of integer parameters from float ones,

■

scoreint – function of integer argument. Constants:

■

The index,

■

The rank,

■

The weight in the global criterion,

SLIDE 24

Stopping Criterion

24 / 30

... proportional_stop=1e-8 stop_count=5 ... Function name:

■

proportional_stop – limits the fraction of the difference between the values on two consecutive steps and the largest value,

■

absolute_stop – limits the difference between the values on two consecutive steps,

■

absolute_score – limits the value,

■

absolute_iter – limits the number of steps,

■

absolute_time – limits the execution time. Constants:

■

Number of successful cases.

SLIDE 25

Transformation of Constraints

25 / 30

... transform=tanh gamma_init=1.0 roundoff_error=1e-12 ... Function name:

■

tanh – sigmoidal transformation function,

■

sin – periodic transformation function,

■

alg – restricted interval,

■

rand – random substitution. Constants:

■

Transformation function parameter,

■

Border offset.

SLIDE 26

Population Control

26 / 30

tau=10 population_size=169 recombination_strategy=de_3_bin_T recombination_weight=0 recombination_prob=0 recombination_gamma=0.9 es_lambda=2 noglobal_eps=0 ... Function name:

■

de_3_bin_T,

■

de_3_bin,

■

de_3_exp_T,

■

de_3_exp. Constants:

■

population size,

■

population diversity control parameter γ,

■

the neighborhood in the parameter space – ε (0 – the whole space),

■

number of seeding individuals Ψ.

SLIDE 27

Procedure Lists

27 / 30

run_before=initstop;optpost;optposteval;writelog; run=gdeep;1;edeep;1;sdeep;1;dpupdate;1;substitute;5;checkstop;1;optpost;1 run_after=optpost;optposteval;writelog; ...

■

run_before – executed once befor the main loop,

■

run – executed on each defined iteration,

■

run_after – executed once after the main loop,

■

initstop – sets the stopping criterion,

■

ptpost, optposteval – best

member generation,

■

writelog – loging,

■

checkstop – stopping criterion check,

■

substitute – oldest members substitution,

■

gdeep, edeep, sdeep – differential evolution steps.

SLIDE 28

Worker Threads

28 / 30

max_threads=12 seed=1525 ... Constants:

■

Maximal number of working threads usually correlate with number of cores,

■

The seed for random number generator is to be changed for each run.

SLIDE 29

Acknowledgments

29 / 30

Funding: RNF Grant 14-14-00302 and by “5-100-2020” Program of the Russian Ministry of Education and Science. Previous funding: SYSPATHO EU FP7 №260429, RFBR grant №11-01-00573, State Contract №11.519.11.6041. State Polytechnical University (St.Petersburg): Maria Samsonova, Andrey Pisarev, Svetlana Surkova. A.M.Ioffe Physico-Technical Institute (St.Petersburg): Alexander M.Samsonov, Sergey Rukolayne, Vitaly Gursky, Igor Gula.

SLIDE 30

Bibliography

30 / 30

1. K.N. Kozlov, V.V. Gursky, I.V. Kulakovskiy, V.V. Muzhichenko, and M.G. Samsonova (2014). Sequence-based model of gap gene regulatory network. In Proceedings of BGRS-SB’2014, Novisibirsk 2. Kozlov, K., Ivanisenko, N., Ivanisenko, V., Kolchanov, N., Samsonova, M., and Samsonov, A. M. (2013). Enhanced Differential Evolution Entirely Parallel Method for Biomedical Applications. LNCS 7979, V. Malyshkin (Ed.): PaCT 2013:409–416 3. Kozlov, K., Surkova, S., Myasnikova, E., Reinitz, J., and Samsonova, M. (2012). Modeling of gap gene expression in Drosophila Kruppel mutants. PLoS Comput. Biol., 8(8):e1002635 4. Kozlov, K. and Samsonov, A. (2011). Deep – differential evolution entirely parallel method for gene regulatory networks. Journal of Supercomputing, 57:172–178