Proximal point algorithm in Hadamard spaces Miroslav Bacak T el - - PowerPoint PPT Presentation

proximal point algorithm in hadamard spaces
SMART_READER_LITE
LIVE PREVIEW

Proximal point algorithm in Hadamard spaces Miroslav Bacak T el - - PowerPoint PPT Presentation

Proximal point algorithm in Hadamard spaces Miroslav Bacak T el ecom ParisTech Optimisation G eom etrique sur les Vari et es - Paris, 21 novembre 2014 Contents of the talk 1 Basic facts on Hadamard spaces 2 Proximal point


slide-1
SLIDE 1

Proximal point algorithm in Hadamard spaces

Miroslav Bacak

T´ el´ ecom ParisTech

Optimisation G´ eom´ etrique sur les Vari´ et´ es

  • Paris, 21 novembre 2014
slide-2
SLIDE 2

Contents of the talk

1 Basic facts on Hadamard spaces 2 Proximal point algorithm 3 Applications to computational phylogenetics

slide-3
SLIDE 3

Proximal point algorithm in Hadamard spaces

Why? Well. . . it is used in:

  • Phylogenetics: computing medians and means of

phylogenetic trees.

  • diffusion tensor imaging: the space P(n, R) of symmetric

positive definite matrices n × n with real entries is a Hadamard space if it is equipped with the Riemannian metric X, Y A := Tr

  • A−1XA−1Y
  • ,

X, Y ∈ TA (P(n, R)) , for every A ∈ P(n, R).

  • Computational biology: shape analyses of tree-like

structures:

slide-4
SLIDE 4

Tree-like structures in organisms

Figure: Bronchial tubes in lungs Figure: Transport system in plants Figure: Human circulatory system

slide-5
SLIDE 5

Definition of Hadamard space

Let (H, d) be a complete metric space where:

1 any two points x0 and x1 are connected by a geodesic

x: [0, 1] → H: t → xt,

2 and,

d (y, xt)2 ≤ (1 − t)d (y, x0)2 + td (y, x1)2 − t(1 − t)d (x0, x1)2 , for every y ∈ H. Then (H, d) is called a Hadamard space. For today: assume that local compactness.

slide-6
SLIDE 6

Geodesic space

x1 x2

slide-7
SLIDE 7

Geodesic space

x1 x2

slide-8
SLIDE 8

Geodesic space

x1 x2 (1 − λ)x1 + λx2

slide-9
SLIDE 9

Definition of nonpositive curvature

A geodesic triangle in a geodesic space:

x1 x2 x3

slide-10
SLIDE 10

Terminology remark

CAT(κ) spaces, for κ ∈ R, were introduced in 1987 by Michail Gromov C = Cartan A = Alexandrov T = Toponogov We are particularly interested in CAT(0) spaces.

slide-11
SLIDE 11

Examples of Hadamard spaces

1 Hilbert spaces, the Hilbert ball 2 complete simply connected Riemannian manifolds

with Sec ≤ 0

3 R-trees: a metric space T is an R-tree if

  • for x, y ∈ T there is a unique geodesic [x, y]
  • if [x, y] ∩ [y, z] = {y}, then [x, z] = [x, y] ∪ [y, z]

4 Euclidean buildings 5 the BHV tree space (space of phylogenetic trees) 6 L2(M, H), where (M, µ) is a probability space:

d2(u, v) :=

  • M

d (u(x), v(x))2 dµ(x) 1

2

, u, v ∈ L2(M, H)

slide-12
SLIDE 12

Convexity in Hadamard spaces

Let (H, d) be a Hadamard space. These spaces allow for a natural definition of convexity:

Definition

A set C ⊂ H is convex if, given x, y ∈ C, we have [x, y] ⊂ C.

Definition

A function f : H → (−∞, ∞] is convex if f ◦ γ is a convex function for each geodesic γ : [0, 1] → H.

slide-13
SLIDE 13

Convexity in Hadamard spaces

Let (H, d) be a Hadamard space. These spaces allow for a natural definition of convexity:

Definition

A set C ⊂ H is convex if, given x, y ∈ C, we have [x, y] ⊂ C.

Definition

A function f : H → (−∞, ∞] is convex if f ◦ γ is a convex function for each geodesic γ : [0, 1] → H.

slide-14
SLIDE 14

Convexity in Hadamard spaces

Let (H, d) be a Hadamard space. These spaces allow for a natural definition of convexity:

Definition

A set C ⊂ H is convex if, given x, y ∈ C, we have [x, y] ⊂ C.

Definition

A function f : H → (−∞, ∞] is convex if f ◦ γ is a convex function for each geodesic γ : [0, 1] → H.

slide-15
SLIDE 15

Examples of convex functions

1 The indicator function of a convex closed set C ⊂ H :

ιC(x) := 0, if x ∈ C, and ιC(x) := ∞, if x / ∈ C.

2 The distance function to a closed convex subset C ⊂ H :

dC(x) := inf

c∈C d(x, c),

x ∈ H.

3 The displacement function of an isometry T : H → H :

δT (x) := d(x, Tx), x ∈ H.

slide-16
SLIDE 16

Examples of convex functions

1 The indicator function of a convex closed set C ⊂ H :

ιC(x) := 0, if x ∈ C, and ιC(x) := ∞, if x / ∈ C.

2 The distance function to a closed convex subset C ⊂ H :

dC(x) := inf

c∈C d(x, c),

x ∈ H.

3 The displacement function of an isometry T : H → H :

δT (x) := d(x, Tx), x ∈ H.

slide-17
SLIDE 17

Examples of convex functions

1 The indicator function of a convex closed set C ⊂ H :

ιC(x) := 0, if x ∈ C, and ιC(x) := ∞, if x / ∈ C.

2 The distance function to a closed convex subset C ⊂ H :

dC(x) := inf

c∈C d(x, c),

x ∈ H.

3 The displacement function of an isometry T : H → H :

δT (x) := d(x, Tx), x ∈ H.

slide-18
SLIDE 18

Examples of convex functions

4 Let c: [0, ∞) → H be a geodesic ray. The function

bc : H → R defined by bc(x) := lim

t→∞ [d (x, c(t)) − t] ,

x ∈ H, is called the Busemann function associated to the ray c.

5 The energy of a mapping u: M → H given by

E(u) :=

  • M×M

d (u(x), u(y))2 p(x, dy)dµ(x), where (M, µ) is a measure space with a Markov kernel p(x, dy). E is convex continuous on L2(M, H).

slide-19
SLIDE 19

Examples of convex functions

4 Let c: [0, ∞) → H be a geodesic ray. The function

bc : H → R defined by bc(x) := lim

t→∞ [d (x, c(t)) − t] ,

x ∈ H, is called the Busemann function associated to the ray c.

5 The energy of a mapping u: M → H given by

E(u) :=

  • M×M

d (u(x), u(y))2 p(x, dy)dµ(x), where (M, µ) is a measure space with a Markov kernel p(x, dy). E is convex continuous on L2(M, H).

slide-20
SLIDE 20

Examples of convex functions

6 Given a1, . . . , aN ∈ H and w1, . . . , wN > 0, set

f(x) :=

N

  • n=1

wnd (x, an)p , x ∈ H, where p ∈ [1, ∞).

  • If p = 1, we get Fermat-Weber problem for optimal facility
  • location. A minimizer of f is called a median.
  • If p = 2, then a minimizer of f is the barycenter of

µ :=

N

  • n=1

wnδan,

  • r the mean of a1, . . . , aN.
slide-21
SLIDE 21

Examples of convex functions

6 Given a1, . . . , aN ∈ H and w1, . . . , wN > 0, set

f(x) :=

N

  • n=1

wnd (x, an)p , x ∈ H, where p ∈ [1, ∞).

  • If p = 1, we get Fermat-Weber problem for optimal facility
  • location. A minimizer of f is called a median.
  • If p = 2, then a minimizer of f is the barycenter of

µ :=

N

  • n=1

wnδan,

  • r the mean of a1, . . . , aN.
slide-22
SLIDE 22

Strongly convex functions

A function f : H → (−∞, ∞] is strongly convex with parameter β > 0 if f ((1 − t)x + ty) ≤ (1 − t)f(x) + tf(y)−βt(1 − t)d(x, y)2, for any x, y ∈ H and t ∈ [0, 1]. Each strongly has a unique minimizer.

Example

Given y ∈ H, the function f := d(y, ·)2 is strongly convex. Indeed, d (y, xt)2 ≤ (1 − t)d (y, x0)2 + td (y, x1)2 − t(1 − t)d (x0, x1)2 , for each geodesic x: [0, 1] → H.

slide-23
SLIDE 23

1 Basic facts on Hadamard spaces 2 Proximal point algorithm 3 Applications to computational phylogenetics

slide-24
SLIDE 24

Proximal point algorithm

Let f : H → (−∞, ∞] be convex lsc. Optimization problem: min

x∈H f(x).

Recall: no (sub)differential, no shooting (singularities). Implicit methods are appropriate. The PPA generates a sequence xi := Jλi (xi−1) := arg min

y∈H

  • f(y) + 1

2λi d(y, xi−1)2

  • ,

where x0 ∈ H is a given starting point and λi > 0, for each i ∈ N.

slide-25
SLIDE 25

Proximal point algorithm

Let f : H → (−∞, ∞] be convex lsc. Optimization problem: min

x∈H f(x).

Recall: no (sub)differential, no shooting (singularities). Implicit methods are appropriate. The PPA generates a sequence xi := Jλi (xi−1) := arg min

y∈H

  • f(y) + 1

2λi d(y, xi−1)2

  • ,

where x0 ∈ H is a given starting point and λi > 0, for each i ∈ N.

slide-26
SLIDE 26

Convergence of proximal point algorithm

Theorem (M.B., 2011)

Let f : H → (−∞, ∞] be a convex lsc function attaining its

  • minimum. Given x0 ∈ H and (λi) such that ∞

1 λi = ∞, the PPA

sequence (xi) converges to a minimizer of f.

(Resolvents are firmly nonexpansive - cheap version for λi = λ.)

Disadvantage: The resolvents xi := Jλi (xi−1) := arg min

y∈H

  • f(y) + 1

2λi d(y, xi−1)2

  • ,

are often difficult to compute.

slide-27
SLIDE 27

Convergence of proximal point algorithm

Theorem (M.B., 2011)

Let f : H → (−∞, ∞] be a convex lsc function attaining its

  • minimum. Given x0 ∈ H and (λi) such that ∞

1 λi = ∞, the PPA

sequence (xi) converges to a minimizer of f.

(Resolvents are firmly nonexpansive - cheap version for λi = λ.)

Disadvantage: The resolvents xi := Jλi (xi−1) := arg min

y∈H

  • f(y) + 1

2λi d(y, xi−1)2

  • ,

are often difficult to compute.

slide-28
SLIDE 28

Splitting proximal point algorithm

Let f1, . . . , fN be convex lsc and consider f(x) :=

N

  • n=1

fn(x), x ∈ H.

Example (Median and mean)

fn := d (·, an) , fn := d (·, an)2 . Key idea: apply resolvents Jn

λ ’s of fn’s in a cyclic or random

  • rder.
slide-29
SLIDE 29

Splitting proximal point algorithm

Let f1, . . . , fN be convex lsc and consider f(x) :=

N

  • n=1

fn(x), x ∈ H.

Example (Median and mean)

fn := d (·, an) , fn := d (·, an)2 . Key idea: apply resolvents Jn

λ ’s of fn’s in a cyclic or random

  • rder.
slide-30
SLIDE 30

Splitting proximal point algorithm

Let f1, . . . , fN be convex lsc and consider f(x) :=

N

  • n=1

fn(x), x ∈ H.

Example (Median and mean)

fn := d (·, an) , fn := d (·, an)2 . Key idea: apply resolvents Jn

λ ’s of fn’s in a cyclic or random

  • rder.
slide-31
SLIDE 31

Splitting proximal point algorithm

Let x0 ∈ H be a starting point. For each k ∈ N0 we apply resolvents in cyclic order: xkN+1 := J1

λk (xkN) ,

xkN+2 := J2

λk (xkN+1) ,

. . . xkN+N := JN

λk (xkN+N−1) ,

  • r in random order:

xi+1 := Jri

λi (xi) ,

where (ri) are random variables with values in {1, . . . , N}.

slide-32
SLIDE 32

Convergence of splitting proximal point algorithm

Theorem (Cyclic order version + Random order version)

Assume that fn are Lipschitz (or locally Lipschitz and the minimizing sequence is bounded). Then

1 the cyclic PPA sequence converges to a minimizer of f 2 the random PPA sequence converges to a minimizer of f

almost surely. Assumptions are satisfied for f(x) :=

N

  • n=1

wnd (x, an)p , x ∈ H, where p ∈ [1, ∞).

slide-33
SLIDE 33

Convergence of splitting proximal point algorithm

Theorem (Cyclic order version + Random order version)

Assume that fn are Lipschitz (or locally Lipschitz and the minimizing sequence is bounded). Then

1 the cyclic PPA sequence converges to a minimizer of f 2 the random PPA sequence converges to a minimizer of f

almost surely. Assumptions are satisfied for f(x) :=

N

  • n=1

wnd (x, an)p , x ∈ H, where p ∈ [1, ∞).

slide-34
SLIDE 34

Splitting proximal point algorithm (for mean)

Hence instead of computing (the usual PPA) xi+1 := arg min

z∈H

N

  • n=1

d (z, an)2 + 1 2λi d (z, xi)2

  • ,

we are to minimize the function xi+1 := arg min

z∈H

  • d (z, an)2 + 1

2λi d (z, xi)2

  • ,

where an are chosen in a cyclic or random order. This is one-dimensional problem! = ⇒ xi+1 is a convex combination of an and xi.

slide-35
SLIDE 35

Splitting proximal point algorithm (for mean)

Hence instead of computing (the usual PPA) xi+1 := arg min

z∈H

N

  • n=1

d (z, an)2 + 1 2λi d (z, xi)2

  • ,

we are to minimize the function xi+1 := arg min

z∈H

  • d (z, an)2 + 1

2λi d (z, xi)2

  • ,

where an are chosen in a cyclic or random order. This is one-dimensional problem! = ⇒ xi+1 is a convex combination of an and xi.

slide-36
SLIDE 36

1 Basic facts on Hadamard spaces 2 Proximal point algorithm 3 Applications to computational phylogenetics

slide-37
SLIDE 37

Left: One of many evolutionary trees Right: A picture of an evolutionary tree by Charles Darwin (1837)

slide-38
SLIDE 38

Billera-Holmes-Vogtmann tree space Td

1 2 3 4 1 2 3 4 2 3 4 1 2 3 4 1 2 3 4 1 Figure: 5 out of 15 orthants of T4

slide-39
SLIDE 39

Billera-Holmes-Vogtmann tree space Td

Figure: A finite set of trees in T4

slide-40
SLIDE 40

Billera-Holmes-Vogtmann tree space Td

Figure: Consider the most frequent tree topology only

slide-41
SLIDE 41

Computing the mean: Random order version

Algorithm (SPPA with fn := d(·, Tn)2)

Input: T1, . . . , TN ∈ Td Step 1: S1 := T1 and i := 1 Step 2: choose r ∈ {1, . . . , N} at random Step 3: Si+1 :=

1 i+1Tr + i i+1Si

Step 4: i := i + 1 Step 5: go to Step 2 Geodesics can be computed in polynomial time: The Owen-Provan algorithm (2011)

slide-42
SLIDE 42

Computing the mean: Random order version

Algorithm (SPPA with fn := d(·, Tn)2)

Input: T1, . . . , TN ∈ Td Step 1: S1 := T1 and i := 1 Step 2: choose r ∈ {1, . . . , N} at random Step 3: Si+1 :=

1 i+1Tr + i i+1Si

Step 4: i := i + 1 Step 5: go to Step 2 Geodesics can be computed in polynomial time: The Owen-Provan algorithm (2011)

slide-43
SLIDE 43

Computing the mean: Random order version

Algorithm (SPPA with fn := d(·, Tn)2)

Input: T1, . . . , TN ∈ Td Step 1: S1 := T1 and i := 1 Step 2: choose r ∈ {1, . . . , N} at random Step 3: Si+1 :=

1 i+1Tr + i i+1Si

Step 4: i := i + 1 Step 5: go to Step 2 Geodesics can be computed in polynomial time: The Owen-Provan algorithm (2011)

slide-44
SLIDE 44

Computing the mean: Random order version

T1 T2 T3 T4 T5 T6 T7

slide-45
SLIDE 45

Computing the mean: Random order version

T1 T2 T3 T4 T5 T6 T7 S1

slide-46
SLIDE 46

Computing the mean: Random order version

T1 T2 T3 T4 T5 T6 T7 S1 S2

slide-47
SLIDE 47

Computing the mean: Random order version

T1 T2 T3 T4 T5 T6 T7 S1 S2 S3

slide-48
SLIDE 48

Computing the mean: Random order version

T1 T2 T3 T4 T5 T6 T7 S1 S2 S3 S4

slide-49
SLIDE 49

Computing the mean: Random order version

T1 T2 T3 T4 T5 T6 T7

slide-50
SLIDE 50

Computing the mean: Random order version

T1 T2 T3 T4 T5 T6 T7

slide-51
SLIDE 51

Computing the mean: Random order version

T1 T2 T3 T4 T5 T6 T7

slide-52
SLIDE 52

Computing the mean: Random order version

T1 T2 T3 T4 T5 T6 T7

slide-53
SLIDE 53

Computing the mean: Random order version

T1 T2 T3 T4 T5 T6 T7

slide-54
SLIDE 54

Le Big Data

Space Td : orthant dimension = d − 2, # of orthants = (2d − 3)!! The actual dimension of Td is d + 1 + (d − 2)(2d − 3)!! # of trees = N (e.g. coming from an MCMC simulation) Our computation: d = 12 hence dim ≈ 1011 and N = 105

slide-55
SLIDE 55

Le Big Data

Space Td : orthant dimension = d − 2, # of orthants = (2d − 3)!! The actual dimension of Td is d + 1 + (d − 2)(2d − 3)!! # of trees = N (e.g. coming from an MCMC simulation) Our computation: d = 12 hence dim ≈ 1011 and N = 105

slide-56
SLIDE 56

Resuls (courtesy of Philipp Benner)

Mouse Rat Guinea pig Squirrel Kangaroo rat Chimpanzee Human Gorilla Marmoset Baboon Orangutan Rabbit Pika Mouse Rat Kangaroo rat Guinea pig Marmoset Baboon Chimpanzee Human Gorilla Orangutan Squirrel Rabbit Pika Chimpanzee Human Gorilla Marmoset Baboon Orangutan Guinea pig Squirrel Mouse Rat Kangaroo rat Rabbit Pika Marmoset Baboon Chimpanzee Human Gorilla Orangutan Mouse Rat Guinea pig Kangaroo rat Squirrel Rabbit Pika

slide-57
SLIDE 57

Resuls (courtesy of Philipp Benner) . . . continued.

Squirrel Guinea pig Kangaroo rat Mouse Rat Human Chimpanzee Gorilla Orangutan Marmoset Baboon Rabbit Pika

Figure: Approximation of the mean of the 100,000 trees.

slide-58
SLIDE 58

References

1 M.B.: Computing medians and means in Hadamard spaces,

SIAM J. Optim. 24 (2014), no. 3, 1542–1566.

2 Benner, Bacak, Bourguignon: Point estimates in

phylogenetic reconstructions, Bioinformatics, Vol 30 (2014), Issue 17.

3 M.B.: Convex analysis and optimization in Hadamard spaces,

De Gruyter, Berlin, 2014.