http://www.stanford.edu/yyye Joint work with Anthony So and Jiawei - - PowerPoint PPT Presentation

http stanford edu yyye
SMART_READER_LITE
LIVE PREVIEW

http://www.stanford.edu/yyye Joint work with Anthony So and Jiawei - - PowerPoint PPT Presentation

SDP Rank Reduction Yinyu Ye, EURO XXII 1 A Unified Theorem on SDP Rank Reduction Yinyu Ye Department of Management Science and Engineering and Institute of Computational and Mathematical Engineering Stanford University Stanford, CA 94305,


slide-1
SLIDE 1

SDP Rank Reduction Yinyu Ye, EURO XXII 1

A Unified Theorem on SDP Rank Reduction

Yinyu Ye Department of Management Science and Engineering and Institute of Computational and Mathematical Engineering Stanford University Stanford, CA 94305, U.S.A.

http://www.stanford.edu/˜yyye

Joint work with Anthony So and Jiawei Zhang

slide-2
SLIDE 2

SDP Rank Reduction Yinyu Ye, EURO XXII 2

Outline

  • Problem Statement
  • Application
  • New SDP Rank Reduction Theorem and Algorithm
  • Sketch of Proof
  • Extension and Question
slide-3
SLIDE 3

SDP Rank Reduction Yinyu Ye, EURO XXII 3

Problem Statement

  • Consider the system of Semidefinite Programming constraints:

Ai • X = bi i = 1, . . . , m, X 0

where given A1, . . . , Am are n × n symmetric positive semidefinite matrices, and b1, . . . , bm ≥ 0, and A • X =

i,j aijxij = TrAT X.

  • Clearly, the feasibility of the above system can be “decided” by using SDP

interior-point algorithms (Alizadeh 91, Nesterov and Nemirovskii 93, etc).

  • More precisely, find an ǫ-approximate solution where solution time is linear in

log(1/ǫ).

slide-4
SLIDE 4

SDP Rank Reduction Yinyu Ye, EURO XXII 4

Problem Statement (Cont’d)

  • However, we are interested in finding a low–rank solution to the above system.
  • The low–rank problem arises in many applications, e.g.:

– localizing sensor network (e.g., Biswas and Y 03, So and Y 04) – metric embedding/dimension reduction (e.g., Johnson and Lindenstrauss 84, Matousek 90) – approximating non-convex (complex, quaternion) quadratic optimization (e.g., Nemirovskii, Roos and Terlaky 99, Luo, Sidiropoulos, Tseng and Zhang 06, Faybusovich 07) – graph rigidity/distance matix (e.g., Alfakih, Khandani and Wolkowicz 99, etc.)

slide-5
SLIDE 5

SDP Rank Reduction Yinyu Ye, EURO XXII 5

Graph Realization

Given a graph G = (V, E) and sets of non–negative weights, say

{dij : (i, j) ∈ E} and {θilj : (i, l, j) ∈ Θ}, the goal is to compute a

realization of G in the Euclidean space Rd for a given low dimension d, i.e.

  • to place the vertices of G in Rd such that
  • the Euclidean distance between every pair of adjacent vertices (i, j) equals

(or bounded) by the prescribed weight dij ∈ E, and

  • the angle between edges (i, l) and (j, l) equals (or bounded) by the

prescribed weight θilj ∈ Θ.

slide-6
SLIDE 6

SDP Rank Reduction Yinyu Ye, EURO XXII 6

−0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5

Figure 1: 50-node 2-D Sensor Localization

slide-7
SLIDE 7

SDP Rank Reduction Yinyu Ye, EURO XXII 7

Figure 2: A 3-D Tensegrity Graph Realization; provided by Anstreicher

slide-8
SLIDE 8

SDP Rank Reduction Yinyu Ye, EURO XXII 8

Figure 3: Tensegrity Graph: A Needle Tower; provided by Anstreicher

slide-9
SLIDE 9

SDP Rank Reduction Yinyu Ye, EURO XXII 9

Figure 4: Molecular Conformation: 1F39(1534 atoms) with 85% of distances below

6 ˚

A and 10% noise on upper and lower bounds

slide-10
SLIDE 10

SDP Rank Reduction Yinyu Ye, EURO XXII 10

Math Programming: Rank-Constrained SDP

Given ak ∈ Rd, dij ∈ Nx, ˆ

dkj ∈ Na, and vilj ∈ Θ, find xi ∈ Rd such that xi − xj2 (≤) = (≥) d2

ij, ∀ (i, j) ∈ Nx, i < j,

ak − xj2 (≤) = (≥) ˆ d2

kj, ∀ (k, j) ∈ Na,

(xi − xl)T (xj − xl) (≤) = (≥) vilj, ∀ (i, l, j) ∈ Θ,

which lead to

Ai • X = bi i = 1, . . . , m, X 0, rank(X) ≤ d;

and relaxed to

Ai • X = bi i = 1, . . . , m, X 0.

slide-11
SLIDE 11

SDP Rank Reduction Yinyu Ye, EURO XXII 11

Some Background

  • Barvinok 95 showed that if the system is feasible, then there exists a solution

X whose rank is at most √ 2m (also see Carath´

eodorys theorem). Moreover, Pataki 98 showed how to construct such an X efficiently.

  • Unfortunately, for the applications mentioned above, this is not enough.

– We want a fixed-low-rank (say d) solution!

  • However, there are some issues:

– Such a solution may not exist! – Even if it does, one may not be able to find it efficiently.

  • So we consider an approximation of the problem.
slide-12
SLIDE 12

SDP Rank Reduction Yinyu Ye, EURO XXII 12

Approximating the Problem

We consider the problem of finding an ˆ

X 0 of rank at most d that satisfies the

system approximately:

β(m, n, d) · bi ≤ Ai • ˆ X ≤ α(m, n, d) · bi ∀ i = 1, . . . , m

Here, distortion factors α ≥ 1 and β ∈ (0, 1]. Clearly, the closer are both to 1, the better.

slide-13
SLIDE 13

SDP Rank Reduction Yinyu Ye, EURO XXII 13

Our Result

Theorem 1. Suppose that the original system is feasible. Let

r = maxi{Rank(Ai)}. Then, for any d ≥ 1, there exists an ˆ X 0 of rank at

most d such that:

α(m, n, d) =        1 + 12 log(4mr) d

for 1 ≤ d ≤ 12 log(4mr)

1 +

  • 12 log(4mr)

d

for d > 12 log(4mr)

slide-14
SLIDE 14

SDP Rank Reduction Yinyu Ye, EURO XXII 14

Our Result

Theorem 1. Suppose that the original system is feasible. Let

r = maxi{Rank(Ai)}. Then, for any d ≥ 1, there exists an ˆ X 0 of rank at

most d such that:

α(m, n, d) =        1 + 12 log(4mr) d

for 1 ≤ d ≤ 12 log(4mr)

1 +

  • 12 log(4mr)

d

for d > 12 log(4mr)

β(m, n, d) =                    1 5e · 1 m2/d

for 1 ≤ d ≤

2 log m log log(2m) 1 4e · 1 logf(m)/d(2m)

for

2 log m log log(2m) < d ≤ 4 log(4mr) 1 −

  • 4 log(4mr)

d

for d > 4 log(4mr) where f(m) = 3 log m/ log log(2m).

slide-15
SLIDE 15

SDP Rank Reduction Yinyu Ye, EURO XXII 15

Our Result

Theorem 1. Suppose that the original system is feasible. Let

r = maxi{Rank(Ai)}. Then, for any d ≥ 1, there exists an ˆ X 0 of rank at

most d such that:

α(m, n, d) =        1 + 12 log(4mr) d

for 1 ≤ d ≤ 12 log(4mr)

1 +

  • 12 log(4mr)

d

for d > 12 log(4mr)

β(m, n, d) =                    1 5e · 1 m2/d

for 1 ≤ d ≤

2 log m log log(2m) 1 4e · 1 logf(m)/d(2m)

for

2 log m log log(2m) < d ≤ 4 log(4mr) 1 −

  • 4 log(4mr)

d

for d > 4 log(4mr) where f(m) = 3 log m/ log log(2m). Moreover, such an ˆ

X can be found in

randomized polynomial time.

slide-16
SLIDE 16

SDP Rank Reduction Yinyu Ye, EURO XXII 16

Some Remarks

In general, the data parameter r can be bounded by

√ 2m, so that α(m, n, d) = 1 + O log m d

  • and

β(m, n, d) =        Ω

  • m−2/d

for d = O

  • log m

log log m

  • (log m)−3 log m/(d log log m)
  • therwise
slide-17
SLIDE 17

SDP Rank Reduction Yinyu Ye, EURO XXII 17

Some Remarks (Cont’d)

  • In the region 1 ≤ d ≤ 2 log m/ log log(2m), the lower bound β is

independent of the ranks of A1, . . . , Am.

  • f(m)/d ≤ 3/2 in the region d >

2 log m log log(2m).

  • 1 −
  • 4 log(4mr)

d

is a constant in the region d > 4 log(4mr)

  • Our result contains as special cases several well-known results in the

literature.

slide-18
SLIDE 18

SDP Rank Reduction Yinyu Ye, EURO XXII 18

Early Result: Metric Embedding

  • Given an n–point set V = {v1, . . . , vn} in Rl, we would like to embed it

into a low–dimensional Euclidean space as faithfully as possible.

  • Specifically, a map f : V → Rd is an α–embedding (where α ≥ 1) if

u − v2 ≤ f(u) − f(v)2 ≤ α · u − v2

The goal is to find an f such that α is as small as possible.

  • It is known that:

– for any ǫ > 0, an (1 + ǫ)–embedding into RO(ǫ−2 log n) exists (Johnson–Lindenstrauss); – for any fixed d ≥ 1, an O(n2/dd−1/2 log1/2 n)–embedding into Rd exists (Matousek).

slide-19
SLIDE 19

SDP Rank Reduction Yinyu Ye, EURO XXII 19

Early Result: Metric Embedding (Cont’d)

We can get these results from our Theorem. We focus on the fixed d case.

  • Let {ei}m

i=1 be the standard basis vectors, and set

Eij = (ei − ej)(ei − ej)T .

  • Let U be the m × n matrix whose i–th column is vi. Then, X = U T U

satisfies the system Eij • X = vi − vj2

2 for 1 ≤ i < j ≤ n.

  • By our Theorem, we can find an ˆ

X 0 of rank at most d such that: Ω(n−4/d) · vi − vj2

2 ≤ Eij • ˆ

X ≤ O(log n/d) · vi − vj2

2

  • Upon decomposing ˆ

X = ˆ U T ˆ U, where ˆ U is d × n, we recover points ˆ u1, . . . , ˆ un ∈ Rd such that: Ω(n−2/d) · vi − vj2 ≤ ˆ ui − ˆ uj2 ≤ O(

  • log n/d) · vi − vj2.

The embedding results imply only a weaker version (r = 1) of our theorem.

slide-20
SLIDE 20

SDP Rank Reduction Yinyu Ye, EURO XXII 20

Early Result: Approximating QPs

  • Let A1, . . . , Am be positive semidefinite. Consider the following QP:

v∗ = maximize xT Ax

subject to xT Aix ≤ 1

i = 1, . . . , m

and its natural SDP relaxation:

v∗

sdp = maximize A • X

subject to Ai • X ≤ 1

i = 1, . . . , m; X 0

  • Let X∗ be an optimal solution to the SDP

.

  • Nemirovskii et al. showed that one can randomly extract a rank–1 matrix ˆ

X

from X∗ such that it is feasible for the SDP and that

E[A • ˆ X] ≥ Ω(log−1 m)v∗.

slide-21
SLIDE 21

SDP Rank Reduction Yinyu Ye, EURO XXII 21

Early Result: Approximating QPs (Cont’d)

We can obtain a similar result from our Theorem.

  • The matrix X∗ satisfies the system:

Ai • X∗ = bi ≤ 1 i = 1, . . . , m

  • Our proof of the Theorem shows that one can find a rank–1 matrix ˆ

X 0

such that:

E[A • ˆ X] = v∗

sdp,

Ai • ˆ X ≤ O(log m) · bi i = 1, . . . , m

  • By scaling down ˆ

X by a factor of O(log m), we obtain a feasible rank–1

matrix ˆ

X′ that satisfies E[A • ˆ X′] ≥ Ω(log−1 m)v∗.

slide-22
SLIDE 22

SDP Rank Reduction Yinyu Ye, EURO XXII 22

Early Result: Approximating QPs (Cont’d)

  • Luo et al. considered the following real (complex) QP:

minimize xT Ax subject to xT Aix ≥ 1

i = 1, . . . , m

and its natural SDP relaxation: minimize A • X subject to Ai • X ≥ 1

i = 1, . . . , m; X 0

  • They showed how to extract a solution ˆ

x from an optimal solution matrix to

the SDP so that it is feasible for the SDP and that it is within a factor

O(m−2) (O(m−1)) of the optimal.

  • Again, we can obtain the same results from our Theorem on both real

(d = 1) and complex (d = 2) spaces.

slide-23
SLIDE 23

SDP Rank Reduction Yinyu Ye, EURO XXII 23

How Sharp are the Bounds?

  • For metric embedding, it is known that:

– for any d ≥ 1, there exists an n–point set V ⊂ Rd+1 such that any embedding of V into Rd requires D = Ω(n1/⌊(d+1)/2⌋) (Matousek); – there exists an n–point set V ⊂ Rl for some l such that for any

ǫ ∈ (n−1/2, 1/2), say, an (1 + ǫ)–embedding of V into Rd will require d = Ω((ǫ2 log(1/ǫ))−1 log n) (Alon 03).

Thus, from the metric embedding perspective, the ratio of our upper and lower bounds is almost tight for d ≥ 3.

slide-24
SLIDE 24

SDP Rank Reduction Yinyu Ye, EURO XXII 24

How Sharp are the Bounds? (Cont’d)

  • For the QP:

v∗ = maximize xT Ax

subject to xT Aix ≤ 1

i = 1, . . . , m

and its natural SDP relaxation:

v∗

sdp = maximize A • X

subject to Ai • X ≤ 1

i = 1, . . . , m; X 0

Nemirovskii et al. showed that the ratio between v∗ and v∗

sdp can be as large

as Ω(log m).

  • For the minimization version, Luo et al. showed that the ratio can be as small

as Ω(m−2). Thus, from the QP perspective, the ratio of our upper and lower bounds is almost tight for d = 1.

slide-25
SLIDE 25

SDP Rank Reduction Yinyu Ye, EURO XXII 25

Sketch of Proof of the Theorem

  • Without loss of generality, we may assume that X = I is feasible for the
  • riginal system, that is, our system becomes

Ai • X = Tr(Ai) i = 1, . . . , m; X 0.

  • Thus, the Theorem becomes:

Theorem 2. Let A1, . . . , Am be n × n positive semidefinite matrices. Then, for any d ≥ 1, there exists an ˆ

X 0 with rank at most d such that: β(m, n, d) · Tr(Ai) ≤ Ai • ˆ X ≤ α(m, n, d) · Tr(Ai) ∀ i = 1, . . . , m

slide-26
SLIDE 26

SDP Rank Reduction Yinyu Ye, EURO XXII 26

Sketch of Proof of the Theorem (Cont’d)

The algorithm for generating ˆ

X is simple:

  • generate i.i.d. Gaussian RVs ξj

i with mean 0 and variance 1/d and define

column vector ξj = (ξj

1; . . . ; ξj n), for 1 ≤ i ≤ n and 1 ≤ j ≤ d;

  • return ˆ

X = d

j=1 ξj(ξj)T .

slide-27
SLIDE 27

SDP Rank Reduction Yinyu Ye, EURO XXII 27

Sketch of Proof of the Theorem (Cont’d)

The analysis makes use of the following Markov inequality: Lemma 1. Let ξ1, . . . , ξn be i.i.d. standard Gaussian RVs. Let α ∈ (1, ∞) and

β ∈ (0, 1) be constants, and Chi-square Un = n

i=1 ξ2 i . Then, the following

hold:

Pr (Un ≥ αn) ≤ exp n 2 (1 − α + log α)

  • Pr (Un ≤ βn) ≤ exp

n 2 (1 − β + log β)

slide-28
SLIDE 28

SDP Rank Reduction Yinyu Ye, EURO XXII 28

Sketch of Proof of the Theorem (Cont’d)

Lemma 2. Let H be an n × n positive semidefinite matrix and r = rank(H). Then, for any β ∈ (0, 1), we have:

Pr

  • H • ˆ

X ≤ βTr(H)

  • ≤ r · exp

d 2 (1 − β + log β)

  • On the other hand, if β satisfies eβ log r ≤ 1/5, then the above can be sharpened to:

Pr

  • H • ˆ

X ≤ βTr(H)

  • ≤ (5eβ/2)d/2

Note that (2) is independent of r!

slide-29
SLIDE 29

SDP Rank Reduction Yinyu Ye, EURO XXII 29

Sketch of Proof of the Theorem (Cont’d)

Lemma 2. Let H be an n × n positive semidefinite matrix and r = rank(H). Then, for any β ∈ (0, 1), we have:

Pr

  • H • ˆ

X ≤ βTr(H)

  • ≤ r · exp

d 2 (1 − β + log β)

  • (1)

On the other hand, if β satisfies eβ log r ≤ 1/5, then the above can be sharpened to:

Pr

  • H • ˆ

X ≤ βTr(H)

  • ≤ (5eβ/2)d/2

(2) Note that (2) is independent of r! For any α > 1, we have:

Pr

  • H • ˆ

X ≥ αTr(H)

  • ≤ r · exp

d 2 (1 − α + log α)

  • (3)
slide-30
SLIDE 30

SDP Rank Reduction Yinyu Ye, EURO XXII 30

Sketch of Proof of the Theorem (Cont’d)

  • It is easy to establish (1) and (3) using Lemma 1.
  • By applying the bounds (1) and (3) of Lemma 2 to each A1, . . . , Am and

taking the union bound, we can get the upper bound in the Theorem. However, the lower bound obtained this way is weaker.

  • To obtain a better lower bound (for the region

1 ≤ d ≤ 2 log m/ log log(2m)), we use the bound (2) in Lemma 2.

  • To prove it, consider the spectral decomposition H = r

k=1 λkvkvT k and

λ1 ≥ λ2 ≥ · · · ≥ λr > 0.

slide-31
SLIDE 31

SDP Rank Reduction Yinyu Ye, EURO XXII 31

Sketch of Proof of the Theorem (Cont’d)

  • Recall that it says if β ∈ (0, 1) satisfies eβ log r ≤ 1/5, then

Pr

  • H • ˆ

X ≤ βTr(H)

  • ≤ (5eβ/2)d/2
  • First, by the spectral decomposition, we have

H • ˆ X = r

  • k=1

λkvkvT

k

d

  • j=1

ξj(ξj)T   =

r

  • k=1

d

  • j=1

λk(vT

k ξj)2.

  • Now, observe that (vT

k ξj)k,j ∼ N(0, d−1) and mutually independent.

  • Hence, H • ˆ

X has the same distribution as the weighted Chi-square r

k=1 λk

d

j=1 ˜

ξ2

kj, where ˜

ξkj are i.i.d. Gaussian of N(0, d−1).

slide-32
SLIDE 32

SDP Rank Reduction Yinyu Ye, EURO XXII 32

Sketch of Proof of the Theorem (Cont’d)

  • Let ¯

λk = λk r

k=1 λk. It then follows that:

Pr

  • H • ˆ

X ≤ βTr(H)

  • = Pr

 

r

  • k=1

λk

d

  • j=1

˜ ξ2

kj ≤ β

r

  • k=1

λk   = Pr  

r

  • k=1

¯ λk

d

  • j=1

˜ ξ2

kj ≤ β

  ≡ p(r, ¯ λ, β)

  • We now bound p(r, ¯

λ, β). On the one hand, by replacing all ¯ λk by the

smallest one ¯

λr and using the tail estimates of Lemma 1, we have: p

  • r, ¯

λ, β

  • ≤ Pr

 ¯ λr

r

  • k=1

d

  • j=1

˜ ξ2

kj ≤ β

  ≤ eβ r¯ λr rd/2

slide-33
SLIDE 33

SDP Rank Reduction Yinyu Ye, EURO XXII 33

  • On the other hand, by dropping smallest term ¯

λr in the summation p

  • r, ¯

λ, β

  • ≤ Pr

 

r−1

  • k=1

d

  • j=1

¯ λk ˜ ξ2

kj ≤ β

  = Pr  

r−1

  • k=1

d

  • j=1

¯ λk 1 − ¯ λr ˜ ξ2

kj ≤

β 1 − ¯ λr   ≡ p

  • r − 1,

¯ λ1:r−1 1 − ¯ λr , β 1 − ¯ λr

  • .
slide-34
SLIDE 34

SDP Rank Reduction Yinyu Ye, EURO XXII 34

Sketch of Proof of the Theorem (Cont’d)

  • By unrolling the recursive formula, we have:

p

  • r, ¯

λ, β

  • ≤ min

1≤k≤r

eβ k¯ λk kd/2

  • Let γ = p
  • r, ¯

λ, β 2/d. Note that γ ∈ (0, 1). From the above, we have ¯ λk ≤

  • kγ1/k−1 eβ for k = 1, . . . , r.
  • Upon summing over k and using the fact that r

k=1 ¯

λk = 1, we obtain: eβ

r

  • k=1

1 kγ1/k ≥ 1

slide-35
SLIDE 35

SDP Rank Reduction Yinyu Ye, EURO XXII 35

Sketch of Proof of the Theorem (Cont’d)

  • r
  • k=1

1 kγ1/k ≤ 1 γ + r

1

1 tγ1/t dt = 1 γ + log(1/γ)

log(1/γ) r

et t dt.

  • Then, one can show that the above implies that:

1 eβ ≤ 2 γ + log r

  • Together with the assumption that eβ log r ≤ 1/5, we conclude that:

5eβ/2 ≥ γ = Pr

  • H • ˆ

X ≤ βTr(H) 2/d

as desired.

slide-36
SLIDE 36

SDP Rank Reduction Yinyu Ye, EURO XXII 36

SDP with an Objective Function

Our result can be used for solving SDP with an objective function:

min C • X, subject to Ai • X = bi

for i = 1, . . . , m; X 0. When X is optimal, there must be a ( ¯

S, ¯ y) feasible for the dual such that ¯ S • X = 0 (under a mild condition).

One can treat ¯

S • X = 0 as an equality constraint. Thus, the rounding method

will preserve ¯

S • ˆ X = 0, that is, low–rank ˆ X is optimal for a “nearby” problem to

the original SDP with the identical objective.

slide-37
SLIDE 37

SDP Rank Reduction Yinyu Ye, EURO XXII 37

Question

  • Is there deterministic algorithm? Choose the largest d eigenvalue component
  • f X?
  • In practical applications, we see much smaller distortion, why?
  • Add a regularization objective to find a low rank SDP solution?