1
- O. Schütze
Some Comments on GD and IGD and Relations to the Hausdorff Distance
- O. Schütze, X. Esquivel, A. Lara, C. Coello
Some Comments on GD and IGD and Relations to the Hausdorff Distance - - PowerPoint PPT Presentation
1 Some Comments on GD and IGD and Relations to the Hausdorff Distance O. Schtze, X. Esquivel, A. Lara , C. Coello CINVESTAV-IPN Centro de Investigacin y de Estudios Avanzados del Instituto Politcnico Nacional. Mexico City, Mexico O.
1
2
Introduction and Background
Investigation of the Indicators
A ‘New’ Indicator
3
⎪ ⎩ ⎪ ⎨ ⎧ → ⊂ → ⊂ = R R Q f R R Q f F
n k n
: : min
1
Multi-Objective Optimization Problem PQ = set of optimal solutions (Pareto set) F(PQ) = the image of PQ (Pareto front)
Pareto set
f2 f1
Pareto front
f1,f2 x First we consider discrete (or discretized) models, i.e., |Q|<∞.
4
1
k n
Example: Consider the MOP where g:[0,1]nRk-1 (Okabe, ZDT). Assume a point x=(ε,z), z∈[0,1]n-1, is a member of the archive/population. Further, assume that new candidate solutions are chosen uniformly at random from the domain. Then the probability to find a point that dominates x is less than ε ( objective 1). The distance of x to PQ can be ‘large’.
5
P hypothetical Pareto front X1 perfect approximation of P, except one outlier X2 none of the elements are ‘near’ to P Question: Which approxomation is ‘better’? Extreme situations:
6
Use of a Metric + greedy search = shortest path to the set of interest ( triangle inequality)
Averaging the Results + Single outliers do not have a mayor influence on the result
neccessarily the shortest path to the set of interest Trade off for the indicator D when measuring results of MOEAs (the design of MOEAs is influenced by D):
7
Definition: Suppose X is a set and d:X×XR is a function. Then d is called a metric on X if, and only if, for each a,b,c∈X:
) , ( ) , ( ) , ( ) ( ) , ( ) , ( ) ( ) , ( and ) , ( ) ( c a d b a d c a d c a b d b a d b b a b a d b a d a + ≤ = = ⇔ = ≥
(Positive Property) (Symmetric Property) (Triangle Inequality) Variants:
triangle inequality:
1 )), , ( ) , ( ( ) , ( ≥ + ≤ σ σ c a d b a d c a d
8
Definition: Let u,v∈Rn, A,B⊂Rn, and ||.|| be a vector norm. The Hausdorff distance dH is defined as follows:
H B u A v
∈ ∈
u A B A
Remarks: (i) dist(A,B) is not symmetric: if B is a proper subset of A, then it is dist(B,A)=0 and dist(A,B)>0. (ii) dH is a metric on the set of discrete sets. It can also be used for continuous spaces. In that case it is dH(A,B)=0 ⇔clos(A)=clos(B)
9
GD as proposed by Van Veldhuizen applied on general finite sets X, Y⊂Rk using dist:
p X i p i Y
x dist X Y X GD
/ 1 | | 1
) , ( 1 ) , ( ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =
=
Metric properties:
it is GD(X,Y)=0 ⇔ X⊂Y (X can be a proper subset of Y (*))
(*): then GD(X,Y)=0 but GD(Y,X)>0
10
2.) Investigate (relaxed) triangle inequality: let X,Z⊂Rk s.t. GD(X,Z)>0. Let rhs(Y):= GD(X,Y)+GD(Y,Z) and define Yn := X ∪ {y1,y2,…,yn} such that Σi dist(yi,Z) < ∞. Then GD(X,Y)=0 and GD(Y,Z)0 for n∞ GD does not satisfy and relaxed triangle inequality since rhs(y)0. Note: for p>1, any set {y1,..,yn}⊂F(Q) (if compact) can be taken!! 1.) Normalization strategy of GD: Let A1={a} with dist(F(a),F(PQ))=1, i.e., GD(F(A1),F(PQ))=1 Now let An be the multiset consisting of n copies of a, An={a,…,a}, then ) 1 ,.., 1 ( )) ( ), ( ( → = = n n n P F A F GD
p p T Q n
11
p X i p i p p X i p i p
Y x dist X Y x dist X Y X GD
/ 1 | | 1 / 1 | | 1
) , ( 1 ) , ( 1 ) , ( ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =
= =
Nearby modification: take the power mean of the distances:
12
IGD as proposed by Coello & Cruz applied on general finite sets X, Y⊂Rk using dist:
p Y i p i X
y dist Y Y X IGD
/ 1 | | 1
) , ( 1 ) , ( ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =
=
p Y i p i p p Y i p i p
X y dist Y X y dist Y Y X IGD
/ 1 | | 1 / 1 | | 1
) , ( 1 ) , ( 1 ) , ( ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =
= =
13
p p p
Proposition 1: ∆p is a semi-metric for 1≤p<∞ and a metric for p=∞ Remark: for p=∞ the indicator ∆p coincides with dH Proposition 2: let |X|,|Y|,|Z|≤N, then
)) , ( ) , ( ( ) , ( Z Y Y X N Z X
p p p p
Δ + Δ ≤ Δ
Observation: GD(X,Y) is an ‘averaged version’ of dist(X,Y), same for IGD combine GD and IGD as for dH:
14
p=1 p=2 p=5 p=10 p=∞ N=2 0.541 0.15 0.026 0.008 N=4 0.249 0.06 0.019 0.009 N=6 0.105 0.033 0.008 0.003 N=10 0.02 0.004 0.002 0.001 N=100
Table: Percentage of the triangle violations (σ=1) for different values of p. Hereby, we have taken 100,000 different sets A,B,C with |A|,|B|,|C|=N, k=2, each entry randomly chosen within [0,1].
15
P hypothetical Pareto front X1 perfect approximation of P, except one outlier X2 none of the elements are ‘near’ to P Question: Which approxomation is ‘better’?
p=1 p=2 p=5 p=10 p=∞ ∆p(P,X1) 0.8182 2.714 4.047 5.571 9 ∆p(P,X2) 2.828 2.828 2.828 2.828 2.828
16
M1 m1
f1 f2
M2 m2
2 1 1
] , [ R M m → γ
Now consider continuous models In general: k objectives PQ (k-1)-dimensional GDp: A finite, PQ compact GD turns to a continuous SOP
p M m p Q
dt A F t dist m M P F A F IGD
/ 1 1 1
1 1
)) ( ), ( ( 1 )) ( ), ( ( ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − =
γ
IGDp: PQ continuous the power mean of IGDp turns into an integral. Example: k=2, F(PQ) connected, then
17
Task: PQ given analytically, compute an approximation Y of F(PQ) with dH(Y,F(PQ))<δ (a priori defined approximation quality) For k=2: use continuation-like methods: select step size t such that ||F(x+tv)-F(x)||∞≈Θδ, Θ<1 a safety factor (selection of t based on Lipschitz estimations)
−4 −3 −2 −1 1 2 3 4 −0.2 0.2 0.4 0.6 0.8 1 1.2 f1 f2 PF −4 −3 −2 −1 1 2 3 4 −0.2 0.2 0.4 0.6 0.8 1 1.2 f1 f2 PF
δ=0.01 δ=0.4
OKA2
18
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 pop1 pop2 pop3 pop4 pop5 pop6 pop7 Pareto Front
NSGA-II applied on ZDT1
Y = F(PQ) Yi=F(popi) ∆2(Y1,Y)=3.03 ∆2(Y2,Y)=2.71 ∆2(Y3,Y)=1.43 ∆2(Y4,Y)=0.77 ∆2(Y5,Y)=0.31 ∆2(Y6,Y)=0.12 ∆2(Y7,Y)=0.007
19
Conclusions
archive sizes
Open Questions
improve ∆p? (∆p is NOT compliant with the dominance relation!)
20