SLIDE 1 Proximal point algorithm in Hadamard spaces
Miroslav Bacak
T´ el´ ecom ParisTech
Optimisation G´ eom´ etrique sur les Vari´ et´ es
SLIDE 2
Contents of the talk
1 Basic facts on Hadamard spaces 2 Proximal point algorithm 3 Applications to computational phylogenetics
SLIDE 3 Proximal point algorithm in Hadamard spaces
Why? Well. . . it is used in:
- Phylogenetics: computing medians and means of
phylogenetic trees.
- diffusion tensor imaging: the space P(n, R) of symmetric
positive definite matrices n × n with real entries is a Hadamard space if it is equipped with the Riemannian metric X, Y A := Tr
X, Y ∈ TA (P(n, R)) , for every A ∈ P(n, R).
- Computational biology: shape analyses of tree-like
structures:
SLIDE 4
Tree-like structures in organisms
Figure: Bronchial tubes in lungs Figure: Transport system in plants Figure: Human circulatory system
SLIDE 5
Definition of Hadamard space
Let (H, d) be a complete metric space where:
1 any two points x0 and x1 are connected by a geodesic
x: [0, 1] → H: t → xt,
2 and,
d (y, xt)2 ≤ (1 − t)d (y, x0)2 + td (y, x1)2 − t(1 − t)d (x0, x1)2 , for every y ∈ H. Then (H, d) is called a Hadamard space. For today: assume that local compactness.
SLIDE 6
Geodesic space
x1 x2
SLIDE 7
Geodesic space
x1 x2
SLIDE 8
Geodesic space
x1 x2 (1 − λ)x1 + λx2
SLIDE 9
Definition of nonpositive curvature
A geodesic triangle in a geodesic space:
x1 x2 x3
SLIDE 10
Terminology remark
CAT(κ) spaces, for κ ∈ R, were introduced in 1987 by Michail Gromov C = Cartan A = Alexandrov T = Toponogov We are particularly interested in CAT(0) spaces.
SLIDE 11 Examples of Hadamard spaces
1 Hilbert spaces, the Hilbert ball 2 complete simply connected Riemannian manifolds
with Sec ≤ 0
3 R-trees: a metric space T is an R-tree if
- for x, y ∈ T there is a unique geodesic [x, y]
- if [x, y] ∩ [y, z] = {y}, then [x, z] = [x, y] ∪ [y, z]
4 Euclidean buildings 5 the BHV tree space (space of phylogenetic trees) 6 L2(M, H), where (M, µ) is a probability space:
d2(u, v) :=
d (u(x), v(x))2 dµ(x) 1
2
, u, v ∈ L2(M, H)
SLIDE 12
Convexity in Hadamard spaces
Let (H, d) be a Hadamard space. These spaces allow for a natural definition of convexity:
Definition
A set C ⊂ H is convex if, given x, y ∈ C, we have [x, y] ⊂ C.
Definition
A function f : H → (−∞, ∞] is convex if f ◦ γ is a convex function for each geodesic γ : [0, 1] → H.
SLIDE 13
Convexity in Hadamard spaces
Let (H, d) be a Hadamard space. These spaces allow for a natural definition of convexity:
Definition
A set C ⊂ H is convex if, given x, y ∈ C, we have [x, y] ⊂ C.
Definition
A function f : H → (−∞, ∞] is convex if f ◦ γ is a convex function for each geodesic γ : [0, 1] → H.
SLIDE 14
Convexity in Hadamard spaces
Let (H, d) be a Hadamard space. These spaces allow for a natural definition of convexity:
Definition
A set C ⊂ H is convex if, given x, y ∈ C, we have [x, y] ⊂ C.
Definition
A function f : H → (−∞, ∞] is convex if f ◦ γ is a convex function for each geodesic γ : [0, 1] → H.
SLIDE 15
Examples of convex functions
1 The indicator function of a convex closed set C ⊂ H :
ιC(x) := 0, if x ∈ C, and ιC(x) := ∞, if x / ∈ C.
2 The distance function to a closed convex subset C ⊂ H :
dC(x) := inf
c∈C d(x, c),
x ∈ H.
3 The displacement function of an isometry T : H → H :
δT (x) := d(x, Tx), x ∈ H.
SLIDE 16
Examples of convex functions
1 The indicator function of a convex closed set C ⊂ H :
ιC(x) := 0, if x ∈ C, and ιC(x) := ∞, if x / ∈ C.
2 The distance function to a closed convex subset C ⊂ H :
dC(x) := inf
c∈C d(x, c),
x ∈ H.
3 The displacement function of an isometry T : H → H :
δT (x) := d(x, Tx), x ∈ H.
SLIDE 17
Examples of convex functions
1 The indicator function of a convex closed set C ⊂ H :
ιC(x) := 0, if x ∈ C, and ιC(x) := ∞, if x / ∈ C.
2 The distance function to a closed convex subset C ⊂ H :
dC(x) := inf
c∈C d(x, c),
x ∈ H.
3 The displacement function of an isometry T : H → H :
δT (x) := d(x, Tx), x ∈ H.
SLIDE 18 Examples of convex functions
4 Let c: [0, ∞) → H be a geodesic ray. The function
bc : H → R defined by bc(x) := lim
t→∞ [d (x, c(t)) − t] ,
x ∈ H, is called the Busemann function associated to the ray c.
5 The energy of a mapping u: M → H given by
E(u) :=
d (u(x), u(y))2 p(x, dy)dµ(x), where (M, µ) is a measure space with a Markov kernel p(x, dy). E is convex continuous on L2(M, H).
SLIDE 19 Examples of convex functions
4 Let c: [0, ∞) → H be a geodesic ray. The function
bc : H → R defined by bc(x) := lim
t→∞ [d (x, c(t)) − t] ,
x ∈ H, is called the Busemann function associated to the ray c.
5 The energy of a mapping u: M → H given by
E(u) :=
d (u(x), u(y))2 p(x, dy)dµ(x), where (M, µ) is a measure space with a Markov kernel p(x, dy). E is convex continuous on L2(M, H).
SLIDE 20 Examples of convex functions
6 Given a1, . . . , aN ∈ H and w1, . . . , wN > 0, set
f(x) :=
N
wnd (x, an)p , x ∈ H, where p ∈ [1, ∞).
- If p = 1, we get Fermat-Weber problem for optimal facility
- location. A minimizer of f is called a median.
- If p = 2, then a minimizer of f is the barycenter of
µ :=
N
wnδan,
- r the mean of a1, . . . , aN.
SLIDE 21 Examples of convex functions
6 Given a1, . . . , aN ∈ H and w1, . . . , wN > 0, set
f(x) :=
N
wnd (x, an)p , x ∈ H, where p ∈ [1, ∞).
- If p = 1, we get Fermat-Weber problem for optimal facility
- location. A minimizer of f is called a median.
- If p = 2, then a minimizer of f is the barycenter of
µ :=
N
wnδan,
- r the mean of a1, . . . , aN.
SLIDE 22
Strongly convex functions
A function f : H → (−∞, ∞] is strongly convex with parameter β > 0 if f ((1 − t)x + ty) ≤ (1 − t)f(x) + tf(y)−βt(1 − t)d(x, y)2, for any x, y ∈ H and t ∈ [0, 1]. Each strongly has a unique minimizer.
Example
Given y ∈ H, the function f := d(y, ·)2 is strongly convex. Indeed, d (y, xt)2 ≤ (1 − t)d (y, x0)2 + td (y, x1)2 − t(1 − t)d (x0, x1)2 , for each geodesic x: [0, 1] → H.
SLIDE 23
1 Basic facts on Hadamard spaces 2 Proximal point algorithm 3 Applications to computational phylogenetics
SLIDE 24 Proximal point algorithm
Let f : H → (−∞, ∞] be convex lsc. Optimization problem: min
x∈H f(x).
Recall: no (sub)differential, no shooting (singularities). Implicit methods are appropriate. The PPA generates a sequence xi := Jλi (xi−1) := arg min
y∈H
2λi d(y, xi−1)2
where x0 ∈ H is a given starting point and λi > 0, for each i ∈ N.
SLIDE 25 Proximal point algorithm
Let f : H → (−∞, ∞] be convex lsc. Optimization problem: min
x∈H f(x).
Recall: no (sub)differential, no shooting (singularities). Implicit methods are appropriate. The PPA generates a sequence xi := Jλi (xi−1) := arg min
y∈H
2λi d(y, xi−1)2
where x0 ∈ H is a given starting point and λi > 0, for each i ∈ N.
SLIDE 26 Convergence of proximal point algorithm
Theorem (M.B., 2011)
Let f : H → (−∞, ∞] be a convex lsc function attaining its
- minimum. Given x0 ∈ H and (λi) such that ∞
1 λi = ∞, the PPA
sequence (xi) converges to a minimizer of f.
(Resolvents are firmly nonexpansive - cheap version for λi = λ.)
Disadvantage: The resolvents xi := Jλi (xi−1) := arg min
y∈H
2λi d(y, xi−1)2
are often difficult to compute.
SLIDE 27 Convergence of proximal point algorithm
Theorem (M.B., 2011)
Let f : H → (−∞, ∞] be a convex lsc function attaining its
- minimum. Given x0 ∈ H and (λi) such that ∞
1 λi = ∞, the PPA
sequence (xi) converges to a minimizer of f.
(Resolvents are firmly nonexpansive - cheap version for λi = λ.)
Disadvantage: The resolvents xi := Jλi (xi−1) := arg min
y∈H
2λi d(y, xi−1)2
are often difficult to compute.
SLIDE 28 Splitting proximal point algorithm
Let f1, . . . , fN be convex lsc and consider f(x) :=
N
fn(x), x ∈ H.
Example (Median and mean)
fn := d (·, an) , fn := d (·, an)2 . Key idea: apply resolvents Jn
λ ’s of fn’s in a cyclic or random
SLIDE 29 Splitting proximal point algorithm
Let f1, . . . , fN be convex lsc and consider f(x) :=
N
fn(x), x ∈ H.
Example (Median and mean)
fn := d (·, an) , fn := d (·, an)2 . Key idea: apply resolvents Jn
λ ’s of fn’s in a cyclic or random
SLIDE 30 Splitting proximal point algorithm
Let f1, . . . , fN be convex lsc and consider f(x) :=
N
fn(x), x ∈ H.
Example (Median and mean)
fn := d (·, an) , fn := d (·, an)2 . Key idea: apply resolvents Jn
λ ’s of fn’s in a cyclic or random
SLIDE 31 Splitting proximal point algorithm
Let x0 ∈ H be a starting point. For each k ∈ N0 we apply resolvents in cyclic order: xkN+1 := J1
λk (xkN) ,
xkN+2 := J2
λk (xkN+1) ,
. . . xkN+N := JN
λk (xkN+N−1) ,
xi+1 := Jri
λi (xi) ,
where (ri) are random variables with values in {1, . . . , N}.
SLIDE 32 Convergence of splitting proximal point algorithm
Theorem (Cyclic order version + Random order version)
Assume that fn are Lipschitz (or locally Lipschitz and the minimizing sequence is bounded). Then
1 the cyclic PPA sequence converges to a minimizer of f 2 the random PPA sequence converges to a minimizer of f
almost surely. Assumptions are satisfied for f(x) :=
N
wnd (x, an)p , x ∈ H, where p ∈ [1, ∞).
SLIDE 33 Convergence of splitting proximal point algorithm
Theorem (Cyclic order version + Random order version)
Assume that fn are Lipschitz (or locally Lipschitz and the minimizing sequence is bounded). Then
1 the cyclic PPA sequence converges to a minimizer of f 2 the random PPA sequence converges to a minimizer of f
almost surely. Assumptions are satisfied for f(x) :=
N
wnd (x, an)p , x ∈ H, where p ∈ [1, ∞).
SLIDE 34 Splitting proximal point algorithm (for mean)
Hence instead of computing (the usual PPA) xi+1 := arg min
z∈H
N
d (z, an)2 + 1 2λi d (z, xi)2
we are to minimize the function xi+1 := arg min
z∈H
2λi d (z, xi)2
where an are chosen in a cyclic or random order. This is one-dimensional problem! = ⇒ xi+1 is a convex combination of an and xi.
SLIDE 35 Splitting proximal point algorithm (for mean)
Hence instead of computing (the usual PPA) xi+1 := arg min
z∈H
N
d (z, an)2 + 1 2λi d (z, xi)2
we are to minimize the function xi+1 := arg min
z∈H
2λi d (z, xi)2
where an are chosen in a cyclic or random order. This is one-dimensional problem! = ⇒ xi+1 is a convex combination of an and xi.
SLIDE 36
1 Basic facts on Hadamard spaces 2 Proximal point algorithm 3 Applications to computational phylogenetics
SLIDE 37
Left: One of many evolutionary trees Right: A picture of an evolutionary tree by Charles Darwin (1837)
SLIDE 38
Billera-Holmes-Vogtmann tree space Td
1 2 3 4 1 2 3 4 2 3 4 1 2 3 4 1 2 3 4 1 Figure: 5 out of 15 orthants of T4
SLIDE 39
Billera-Holmes-Vogtmann tree space Td
Figure: A finite set of trees in T4
SLIDE 40
Billera-Holmes-Vogtmann tree space Td
Figure: Consider the most frequent tree topology only
SLIDE 41
Computing the mean: Random order version
Algorithm (SPPA with fn := d(·, Tn)2)
Input: T1, . . . , TN ∈ Td Step 1: S1 := T1 and i := 1 Step 2: choose r ∈ {1, . . . , N} at random Step 3: Si+1 :=
1 i+1Tr + i i+1Si
Step 4: i := i + 1 Step 5: go to Step 2 Geodesics can be computed in polynomial time: The Owen-Provan algorithm (2011)
SLIDE 42
Computing the mean: Random order version
Algorithm (SPPA with fn := d(·, Tn)2)
Input: T1, . . . , TN ∈ Td Step 1: S1 := T1 and i := 1 Step 2: choose r ∈ {1, . . . , N} at random Step 3: Si+1 :=
1 i+1Tr + i i+1Si
Step 4: i := i + 1 Step 5: go to Step 2 Geodesics can be computed in polynomial time: The Owen-Provan algorithm (2011)
SLIDE 43
Computing the mean: Random order version
Algorithm (SPPA with fn := d(·, Tn)2)
Input: T1, . . . , TN ∈ Td Step 1: S1 := T1 and i := 1 Step 2: choose r ∈ {1, . . . , N} at random Step 3: Si+1 :=
1 i+1Tr + i i+1Si
Step 4: i := i + 1 Step 5: go to Step 2 Geodesics can be computed in polynomial time: The Owen-Provan algorithm (2011)
SLIDE 44
Computing the mean: Random order version
T1 T2 T3 T4 T5 T6 T7
SLIDE 45
Computing the mean: Random order version
T1 T2 T3 T4 T5 T6 T7 S1
SLIDE 46
Computing the mean: Random order version
T1 T2 T3 T4 T5 T6 T7 S1 S2
SLIDE 47
Computing the mean: Random order version
T1 T2 T3 T4 T5 T6 T7 S1 S2 S3
SLIDE 48
Computing the mean: Random order version
T1 T2 T3 T4 T5 T6 T7 S1 S2 S3 S4
SLIDE 49
Computing the mean: Random order version
T1 T2 T3 T4 T5 T6 T7
SLIDE 50
Computing the mean: Random order version
T1 T2 T3 T4 T5 T6 T7
SLIDE 51
Computing the mean: Random order version
T1 T2 T3 T4 T5 T6 T7
SLIDE 52
Computing the mean: Random order version
T1 T2 T3 T4 T5 T6 T7
SLIDE 53
Computing the mean: Random order version
T1 T2 T3 T4 T5 T6 T7
SLIDE 54
Le Big Data
Space Td : orthant dimension = d − 2, # of orthants = (2d − 3)!! The actual dimension of Td is d + 1 + (d − 2)(2d − 3)!! # of trees = N (e.g. coming from an MCMC simulation) Our computation: d = 12 hence dim ≈ 1011 and N = 105
SLIDE 55
Le Big Data
Space Td : orthant dimension = d − 2, # of orthants = (2d − 3)!! The actual dimension of Td is d + 1 + (d − 2)(2d − 3)!! # of trees = N (e.g. coming from an MCMC simulation) Our computation: d = 12 hence dim ≈ 1011 and N = 105
SLIDE 56
Resuls (courtesy of Philipp Benner)
Mouse Rat Guinea pig Squirrel Kangaroo rat Chimpanzee Human Gorilla Marmoset Baboon Orangutan Rabbit Pika Mouse Rat Kangaroo rat Guinea pig Marmoset Baboon Chimpanzee Human Gorilla Orangutan Squirrel Rabbit Pika Chimpanzee Human Gorilla Marmoset Baboon Orangutan Guinea pig Squirrel Mouse Rat Kangaroo rat Rabbit Pika Marmoset Baboon Chimpanzee Human Gorilla Orangutan Mouse Rat Guinea pig Kangaroo rat Squirrel Rabbit Pika
SLIDE 57
Resuls (courtesy of Philipp Benner) . . . continued.
Squirrel Guinea pig Kangaroo rat Mouse Rat Human Chimpanzee Gorilla Orangutan Marmoset Baboon Rabbit Pika
Figure: Approximation of the mean of the 100,000 trees.
SLIDE 58
References
1 M.B.: Computing medians and means in Hadamard spaces,
SIAM J. Optim. 24 (2014), no. 3, 1542–1566.
2 Benner, Bacak, Bourguignon: Point estimates in
phylogenetic reconstructions, Bioinformatics, Vol 30 (2014), Issue 17.
3 M.B.: Convex analysis and optimization in Hadamard spaces,
De Gruyter, Berlin, 2014.