Spectral Frank-Wolfe Algorithm: Strict Complementarity and Linear - - PowerPoint PPT Presentation

spectral frank wolfe algorithm strict complementarity and
SMART_READER_LITE
LIVE PREVIEW

Spectral Frank-Wolfe Algorithm: Strict Complementarity and Linear - - PowerPoint PPT Presentation

Spectral Frank-Wolfe Algorithm: Strict Complementarity and Linear Convergence Lijun Ding Joint work with Yingjie Fei, Qiantong Xu, and Chengrun Yang June 15, 2020 Lijun Ding (Cornell University) SpecFW June 15, 2020 1 / 17 Overview


slide-1
SLIDE 1

Spectral Frank-Wolfe Algorithm: Strict Complementarity and Linear Convergence

Lijun Ding

Joint work with Yingjie Fei, Qiantong Xu, and Chengrun Yang

June 15, 2020

Lijun Ding (Cornell University) SpecFW June 15, 2020 1 / 17

slide-2
SLIDE 2

Overview

1

Introduction Problem setup Past algorithms

2

SpecFW and strict complementarity Spectral Frank-Wolfe (SpecFW) Strict complementarity

3

Numerics Experimental setup Numerical results

Lijun Ding (Cornell University) SpecFW June 15, 2020 2 / 17

slide-3
SLIDE 3

Convex smooth minimization over a spectrahedron

Main optimization problem:

Lijun Ding (Cornell University) SpecFW June 15, 2020 3 / 17

slide-4
SLIDE 4

Convex smooth minimization over a spectrahedron

Main optimization problem: minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) function g strongly convex and smooth

Lijun Ding (Cornell University) SpecFW June 15, 2020 3 / 17

slide-5
SLIDE 5

Convex smooth minimization over a spectrahedron

Main optimization problem: minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) function g strongly convex and smooth linear map A and matrix C ∈ Sn

Lijun Ding (Cornell University) SpecFW June 15, 2020 3 / 17

slide-6
SLIDE 6

Convex smooth minimization over a spectrahedron

Main optimization problem: minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) function g strongly convex and smooth linear map A and matrix C ∈ Sn trace tr(·), sum of diagonals

Lijun Ding (Cornell University) SpecFW June 15, 2020 3 / 17

slide-7
SLIDE 7

Convex smooth minimization over a spectrahedron

Main optimization problem: minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) function g strongly convex and smooth linear map A and matrix C ∈ Sn trace tr(·), sum of diagonals positive semidefinite matrices Sn

+, i.e., symmetric matrices with

non-negative eigenvalues

Lijun Ding (Cornell University) SpecFW June 15, 2020 3 / 17

slide-8
SLIDE 8

Convex smooth minimization over a spectrahedron

Main optimization problem: minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) function g strongly convex and smooth linear map A and matrix C ∈ Sn trace tr(·), sum of diagonals positive semidefinite matrices Sn

+, i.e., symmetric matrices with

non-negative eigenvalues unique optimal solution X⋆

Lijun Ding (Cornell University) SpecFW June 15, 2020 3 / 17

slide-9
SLIDE 9

Applications

minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M)

Lijun Ding (Cornell University) SpecFW June 15, 2020 4 / 17

slide-10
SLIDE 10

Applications

minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) matrix sensing [RFP10]

Lijun Ding (Cornell University) SpecFW June 15, 2020 4 / 17

slide-11
SLIDE 11

Applications

minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) matrix sensing [RFP10] matrix completion [CR09, JS10]

Lijun Ding (Cornell University) SpecFW June 15, 2020 4 / 17

slide-12
SLIDE 12

Applications

minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) matrix sensing [RFP10] matrix completion [CR09, JS10] phase retrieval [CESV15, YUTC17]

Lijun Ding (Cornell University) SpecFW June 15, 2020 4 / 17

slide-13
SLIDE 13

Applications

minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) matrix sensing [RFP10] matrix completion [CR09, JS10] phase retrieval [CESV15, YUTC17]

  • ne-bit matrix completion [DPVDBW14]

Lijun Ding (Cornell University) SpecFW June 15, 2020 4 / 17

slide-14
SLIDE 14

Applications

minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) matrix sensing [RFP10] matrix completion [CR09, JS10] phase retrieval [CESV15, YUTC17]

  • ne-bit matrix completion [DPVDBW14]

blind deconvolution [ARR13]

Lijun Ding (Cornell University) SpecFW June 15, 2020 4 / 17

slide-15
SLIDE 15

Applications

minimize

X∈S

n⊂R n×n

f (X) := g(AX) + tr(CX) subject to tr(X) = 1, and X ∈ Sn

+,

(M) matrix sensing [RFP10] matrix completion [CR09, JS10] phase retrieval [CESV15, YUTC17]

  • ne-bit matrix completion [DPVDBW14]

blind deconvolution [ARR13] Expect rank r⋆ = rank(X⋆) ≪ n!

Lijun Ding (Cornell University) SpecFW June 15, 2020 4 / 17

slide-16
SLIDE 16

Projected Gradient (PG)

minimizeX∈S

n

f (X) subject to tr(X) = 1, X ∈ Sn

+

  • SPn

, (M)

Lijun Ding (Cornell University) SpecFW June 15, 2020 5 / 17

slide-17
SLIDE 17

Projected Gradient (PG)

minimizeX∈S

n

f (X) subject to tr(X) = 1, X ∈ Sn

+

  • SPn

, (M)

  • rthogonal projection: PSPn(X) = arg minV X − V F

Lijun Ding (Cornell University) SpecFW June 15, 2020 5 / 17

slide-18
SLIDE 18

Projected Gradient (PG)

minimizeX∈S

n

f (X) subject to tr(X) = 1, X ∈ Sn

+

  • SPn

, (M)

  • rthogonal projection: PSPn(X) = arg minV X − V F

PG: Choose X0 ∈ SPn and η > 0, iterate Xt+1 = PSPn (Xt − η∇f (Xt)) . (PG)

Lijun Ding (Cornell University) SpecFW June 15, 2020 5 / 17

slide-19
SLIDE 19

Projected Gradient (PG)

minimizeX∈S

n

f (X) subject to tr(X) = 1, X ∈ Sn

+

  • SPn

, (M)

  • rthogonal projection: PSPn(X) = arg minV X − V F

PG: Choose X0 ∈ SPn and η > 0, iterate Xt+1 = PSPn (Xt − η∇f (Xt)) . (PG) iteration complexity O( 1

ǫ)

Lijun Ding (Cornell University) SpecFW June 15, 2020 5 / 17

slide-20
SLIDE 20

Projected Gradient (PG)

minimizeX∈S

n

f (X) subject to tr(X) = 1, X ∈ Sn

+

  • SPn

, (M)

  • rthogonal projection: PSPn(X) = arg minV X − V F

PG: Choose X0 ∈ SPn and η > 0, iterate Xt+1 = PSPn (Xt − η∇f (Xt)) . (PG) iteration complexity O( 1

ǫ)

accelerated PG, O( 1

√ǫ)

Lijun Ding (Cornell University) SpecFW June 15, 2020 5 / 17

slide-21
SLIDE 21

Projected Gradient (PG)

minimizeX∈S

n

f (X) subject to tr(X) = 1, X ∈ Sn

+

  • SPn

, (M)

  • rthogonal projection: PSPn(X) = arg minV X − V F

PG: Choose X0 ∈ SPn and η > 0, iterate Xt+1 = PSPn (Xt − η∇f (Xt)) . (PG) iteration complexity O( 1

ǫ)

accelerated PG, O( 1

√ǫ)

Bottleneck: O(n3) per iteration due to FULL EVD in PSPn!

Lijun Ding (Cornell University) SpecFW June 15, 2020 5 / 17

slide-22
SLIDE 22

Projection free method: Frank-Wolfe (FW)

minimizeX∈S

n

f (X) subject to tr(X) = 1, X ∈ Sn

+

  • SPn

, (M)

Lijun Ding (Cornell University) SpecFW June 15, 2020 6 / 17

slide-23
SLIDE 23

Projection free method: Frank-Wolfe (FW)

minimizeX∈S

n

f (X) subject to tr(X) = 1, X ∈ Sn

+

  • SPn

, (M) FW: choose X0 ∈ SPn, iterate

(LOO) Linear Optimization Oracle: Vt = arg minV ∈SPn tr(V ∇f (Xt)).

Lijun Ding (Cornell University) SpecFW June 15, 2020 6 / 17

slide-24
SLIDE 24

Projection free method: Frank-Wolfe (FW)

minimizeX∈S

n

f (X) subject to tr(X) = 1, X ∈ Sn

+

  • SPn

, (M) FW: choose X0 ∈ SPn, iterate

(LOO) Linear Optimization Oracle: Vt = arg minV ∈SPn tr(V ∇f (Xt)). (LS) Line Search: Xt+1 solves minX=ηXt+(1−η)Vt,η∈[0,1] f (X).

Low per iteration complexity: LOO only needs to compute one eigenvector of ∇f (Xt)!

Lijun Ding (Cornell University) SpecFW June 15, 2020 6 / 17

slide-25
SLIDE 25

Projection free method: Frank-Wolfe (FW)

minimizeX∈S

n

f (X) subject to tr(X) = 1, X ∈ Sn

+

  • SPn

, (M) FW: choose X0 ∈ SPn, iterate

(LOO) Linear Optimization Oracle: Vt = arg minV ∈SPn tr(V ∇f (Xt)). (LS) Line Search: Xt+1 solves minX=ηXt+(1−η)Vt,η∈[0,1] f (X).

Low per iteration complexity: LOO only needs to compute one eigenvector of ∇f (Xt)! Bottleneck: Slow convergence, O( 1

ǫ) iteration complexity

in both theory and practice!

Lijun Ding (Cornell University) SpecFW June 15, 2020 6 / 17

slide-26
SLIDE 26

FW variants

Many variants: Randomized regularized FW [Gar16] In-face direction FW [FGM17] BlockFW [AZHHL17] FW with r⋆ = rank(X⋆) = 1 [Gar19] Shortage: No linear convergence or sensitive to input rank estimate

  • r r⋆ = 1.

Lijun Ding (Cornell University) SpecFW June 15, 2020 7 / 17

slide-27
SLIDE 27

Outline

1

Introduction Problem setup Past algorithms

2

SpecFW and strict complementarity Spectral Frank-Wolfe (SpecFW) Strict complementarity

3

Numerics Experimental setup Numerical results

Lijun Ding (Cornell University) SpecFW June 15, 2020 8 / 17

slide-28
SLIDE 28

Spectral Frank-Wolfe (SpecFW)

Spectral Frank-Wolfe: choose X0 ∈ SPn, a rank estimate k > 0, iterate

Lijun Ding (Cornell University) SpecFW June 15, 2020 9 / 17

slide-29
SLIDE 29

Spectral Frank-Wolfe (SpecFW)

Spectral Frank-Wolfe: choose X0 ∈ SPn, a rank estimate k > 0, iterate kLOO: Compute bottom k eigenvectors V = [v1, . . . , vk] ∈ Rn×k of ∇f (Xt).

Lijun Ding (Cornell University) SpecFW June 15, 2020 9 / 17

slide-30
SLIDE 30

Spectral Frank-Wolfe (SpecFW)

Spectral Frank-Wolfe: choose X0 ∈ SPn, a rank estimate k > 0, iterate kLOO: Compute bottom k eigenvectors V = [v1, . . . , vk] ∈ Rn×k of ∇f (Xt). k Spectral Search (kSS): Xt+1 = η⋆Xt + VS⋆V ⊤, in which η⋆ ∈ R, S⋆ ∈ Sk solves min f (ηXt + VSV ⊤) s.t. S ∈ Sk

+, η + tr(S) = 1, η ≥ 0.

Lijun Ding (Cornell University) SpecFW June 15, 2020 9 / 17

slide-31
SLIDE 31

Spectral Frank-Wolfe (SpecFW)

Spectral Frank-Wolfe: choose X0 ∈ SPn, a rank estimate k > 0, iterate kLOO: Compute bottom k eigenvectors V = [v1, . . . , vk] ∈ Rn×k of ∇f (Xt). k Spectral Search (kSS): Xt+1 = η⋆Xt + VS⋆V ⊤, in which η⋆ ∈ R, S⋆ ∈ Sk solves min f (ηXt + VSV ⊤) s.t. S ∈ Sk

+, η + tr(S) = 1, η ≥ 0.

Both procedure are easy to solve for small k!

Lijun Ding (Cornell University) SpecFW June 15, 2020 9 / 17

slide-32
SLIDE 32

Spectral Frank-Wolfe (SpecFW)

Spectral Frank-Wolfe: choose X0 ∈ SPn, a rank estimate k > 0, iterate kLOO: Compute bottom k eigenvectors V = [v1, . . . , vk] ∈ Rn×k of ∇f (Xt). k Spectral Search (kSS): Xt+1 = η⋆Xt + VS⋆V ⊤, in which η⋆ ∈ R, S⋆ ∈ Sk solves min f (ηXt + VSV ⊤) s.t. S ∈ Sk

+, η + tr(S) = 1, η ≥ 0.

Both procedure are easy to solve for small k! Moreover...

Lijun Ding (Cornell University) SpecFW June 15, 2020 9 / 17

slide-33
SLIDE 33

Spectral Frank-Wolfe (SpecFW)

Spectral Frank-Wolfe: choose X0 ∈ SPn, a rank estimate k > 0, iterate kLOO: Compute bottom k eigenvectors V = [v1, . . . , vk] ∈ Rn×k of ∇f (Xt). k Spectral Search (kSS): Xt+1 = η⋆Xt + VS⋆V ⊤, in which η⋆ ∈ R, S⋆ ∈ Sk solves min f (ηXt + VSV ⊤) s.t. S ∈ Sk

+, η + tr(S) = 1, η ≥ 0.

Both procedure are easy to solve for small k! Moreover... O( 1

ǫ) convergence for general k.

Lijun Ding (Cornell University) SpecFW June 15, 2020 9 / 17

slide-34
SLIDE 34

Spectral Frank-Wolfe (SpecFW)

Spectral Frank-Wolfe: choose X0 ∈ SPn, a rank estimate k > 0, iterate kLOO: Compute bottom k eigenvectors V = [v1, . . . , vk] ∈ Rn×k of ∇f (Xt). k Spectral Search (kSS): Xt+1 = η⋆Xt + VS⋆V ⊤, in which η⋆ ∈ R, S⋆ ∈ Sk solves min f (ηXt + VSV ⊤) s.t. S ∈ Sk

+, η + tr(S) = 1, η ≥ 0.

Both procedure are easy to solve for small k! Moreover... O( 1

ǫ) convergence for general k.

Linear convergence if k ≥ r⋆!

Lijun Ding (Cornell University) SpecFW June 15, 2020 9 / 17

slide-35
SLIDE 35

Spectral Frank-Wolfe (SpecFW)

Spectral Frank-Wolfe: choose X0 ∈ SPn, a rank estimate k > 0, iterate kLOO: Compute bottom k eigenvectors V = [v1, . . . , vk] ∈ Rn×k of ∇f (Xt). k Spectral Search (kSS): Xt+1 = η⋆Xt + VS⋆V ⊤, in which η⋆ ∈ R, S⋆ ∈ Sk solves min f (ηXt + VSV ⊤) s.t. S ∈ Sk

+, η + tr(S) = 1, η ≥ 0.

Both procedure are easy to solve for small k! Moreover... O( 1

ǫ) convergence for general k.

Linear convergence if k ≥ r⋆! (also needs strict complementarity)

Lijun Ding (Cornell University) SpecFW June 15, 2020 9 / 17

slide-36
SLIDE 36

Comparison with FW

Two stronger subproblem oracles:

Table: Comparison with FW

FW SpecFW LOO: Compute one eigenvector v kLOO: Compute k eigenvectors V

Lijun Ding (Cornell University) SpecFW June 15, 2020 10 / 17

slide-37
SLIDE 37

Comparison with FW

Two stronger subproblem oracles:

Table: Comparison with FW

FW SpecFW LOO: Compute one eigenvector v kLOO: Compute k eigenvectors V Line Search (LS): k Spectral Search (kSS): min f (ηXt + (1 − η)vv⊤) min f (ηXt + VSV ⊤) s.t. η ∈ [0, 1] s.t. η ≥ 0, S ∈ Sk

+, tr(S) + η = 1

Lijun Ding (Cornell University) SpecFW June 15, 2020 10 / 17

slide-38
SLIDE 38

Comparison with FW

Two stronger subproblem oracles:

Table: Comparison with FW

FW SpecFW LOO: Compute one eigenvector v kLOO: Compute k eigenvectors V Line Search (LS): k Spectral Search (kSS): min f (ηXt + (1 − η)vv⊤) min f (ηXt + VSV ⊤) s.t. η ∈ [0, 1] s.t. η ≥ 0, S ∈ Sk

+, tr(S) + η = 1

In fact, when k = 1, SpecFW is FW!

Lijun Ding (Cornell University) SpecFW June 15, 2020 10 / 17

slide-39
SLIDE 39

Comparison with FW

Two stronger subproblem oracles:

Table: Comparison with FW

FW SpecFW LOO: Compute one eigenvector v kLOO: Compute k eigenvectors V Line Search (LS): k Spectral Search (kSS): min f (ηXt + (1 − η)vv⊤) min f (ηXt + VSV ⊤) s.t. η ∈ [0, 1] s.t. η ≥ 0, S ∈ Sk

+, tr(S) + η = 1

In fact, when k = 1, SpecFW is FW! Expect at least O( 1

ǫ) convergence

even if k ≤ r⋆.

Lijun Ding (Cornell University) SpecFW June 15, 2020 10 / 17

slide-40
SLIDE 40

Comparison with FW

Two stronger subproblem oracles:

Table: Comparison with FW

FW SpecFW LOO: Compute one eigenvector v kLOO: Compute k eigenvectors V Line Search (LS): k Spectral Search (kSS): min f (ηXt + (1 − η)vv⊤) min f (ηXt + VSV ⊤) s.t. η ∈ [0, 1] s.t. η ≥ 0, S ∈ Sk

+, tr(S) + η = 1

In fact, when k = 1, SpecFW is FW! Expect at least O( 1

ǫ) convergence

even if k ≤ r⋆. How about linear convergence when k ≥ r⋆?

Lijun Ding (Cornell University) SpecFW June 15, 2020 10 / 17

slide-41
SLIDE 41

Comparison with FW

Two stronger subproblem oracles:

Table: Comparison with FW

FW SpecFW LOO: Compute one eigenvector v kLOO: Compute k eigenvectors V Line Search (LS): k Spectral Search (kSS): min f (ηXt + (1 − η)vv⊤) min f (ηXt + VSV ⊤) s.t. η ∈ [0, 1] s.t. η ≥ 0, S ∈ Sk

+, tr(S) + η = 1

In fact, when k = 1, SpecFW is FW! Expect at least O( 1

ǫ) convergence

even if k ≤ r⋆. How about linear convergence when k ≥ r⋆? What is strict complementarity?

Lijun Ding (Cornell University) SpecFW June 15, 2020 10 / 17

slide-42
SLIDE 42

Strict complementarity

Eigenspace of ∇f (X⋆) for the smallest eigenvalue, EV(∇f (X⋆)) ⊂ Rn

Lijun Ding (Cornell University) SpecFW June 15, 2020 11 / 17

slide-43
SLIDE 43

Strict complementarity

Eigenspace of ∇f (X⋆) for the smallest eigenvalue, EV(∇f (X⋆)) ⊂ Rn KKT = ⇒ range(X⋆) ⊂ EV(∇f (X⋆))

Lijun Ding (Cornell University) SpecFW June 15, 2020 11 / 17

slide-44
SLIDE 44

Strict complementarity

Eigenspace of ∇f (X⋆) for the smallest eigenvalue, EV(∇f (X⋆)) ⊂ Rn KKT = ⇒ range(X⋆) ⊂ EV(∇f (X⋆)) = ⇒ dim(range(X⋆))

  • =:r⋆

≤ dim(EV(∇f (X⋆)))

  • =:k⋆

Note that the smallest eigenvalue has multiplicity at least r⋆: λn−r⋆+1(∇f (X⋆)) = · · · = λn(∇f (X⋆)). Here λn−i+1(∇f (X⋆)) is the i-th smallest eigenvalue.

Lijun Ding (Cornell University) SpecFW June 15, 2020 11 / 17

slide-45
SLIDE 45

Strict complementarity

Eigenspace of ∇f (X⋆) for the smallest eigenvalue, EV(∇f (X⋆)) ⊂ Rn KKT = ⇒ range(X⋆) ⊂ EV(∇f (X⋆)) = ⇒ dim(range(X⋆))

  • =:r⋆

≤ dim(EV(∇f (X⋆)))

  • =:k⋆

Note that the smallest eigenvalue has multiplicity at least r⋆: λn−r⋆+1(∇f (X⋆)) = · · · = λn(∇f (X⋆)). Here λn−i+1(∇f (X⋆)) is the i-th smallest eigenvalue. Strict complementarity (st. comp.) is r⋆ = k⋆.

Lijun Ding (Cornell University) SpecFW June 15, 2020 11 / 17

slide-46
SLIDE 46

Strict complementarity

Eigenspace of ∇f (X⋆) for the smallest eigenvalue, EV(∇f (X⋆)) ⊂ Rn KKT = ⇒ range(X⋆) ⊂ EV(∇f (X⋆)) = ⇒ dim(range(X⋆))

  • =:r⋆

≤ dim(EV(∇f (X⋆)))

  • =:k⋆

Note that the smallest eigenvalue has multiplicity at least r⋆: λn−r⋆+1(∇f (X⋆)) = · · · = λn(∇f (X⋆)). Here λn−i+1(∇f (X⋆)) is the i-th smallest eigenvalue. Strict complementarity (st. comp.) is r⋆ = k⋆. More concretely, st. comp. is an eigengap condition on r⋆-th and r⋆ + 1-th smallest eigenvalue: λn−r⋆(∇f (X⋆)) − λn−r⋆+1(∇f (X⋆)) > 0.

Lijun Ding (Cornell University) SpecFW June 15, 2020 11 / 17

slide-47
SLIDE 47

Intuition of linear convergence

Under strict complementarity r⋆ = k⋆:

Lijun Ding (Cornell University) SpecFW June 15, 2020 12 / 17

slide-48
SLIDE 48

Intuition of linear convergence

Under strict complementarity r⋆ = k⋆:

1 range(X⋆) = EV(∇f (X⋆)) Lijun Ding (Cornell University) SpecFW June 15, 2020 12 / 17

slide-49
SLIDE 49

Intuition of linear convergence

Under strict complementarity r⋆ = k⋆:

1 range(X⋆) = EV(∇f (X⋆)) 2 Compute V⋆ = [v1, . . . , vk⋆], the bottom eigenvectors of ∇f (X⋆). Lijun Ding (Cornell University) SpecFW June 15, 2020 12 / 17

slide-50
SLIDE 50

Intuition of linear convergence

Under strict complementarity r⋆ = k⋆:

1 range(X⋆) = EV(∇f (X⋆)) 2 Compute V⋆ = [v1, . . . , vk⋆], the bottom eigenvectors of ∇f (X⋆). 3 X⋆ = V⋆S⋆V ⊤

⋆ for some S⋆ ∈ Sr⋆ +, tr(S) = 1

Lijun Ding (Cornell University) SpecFW June 15, 2020 12 / 17

slide-51
SLIDE 51

Intuition of linear convergence

Under strict complementarity r⋆ = k⋆:

1 range(X⋆) = EV(∇f (X⋆)) 2 Compute V⋆ = [v1, . . . , vk⋆], the bottom eigenvectors of ∇f (X⋆). 3 X⋆ = V⋆S⋆V ⊤

⋆ for some S⋆ ∈ Sr⋆ +, tr(S) = 1

4 Obtain S⋆ by solving

minimize f (V⋆S⋆V ⊤

⋆ )

s.t. S ∈ Sr⋆

+, tr(S) = 1.

(reduced M)

5 Problem (M) is solved given ∇f (X⋆)! Lijun Ding (Cornell University) SpecFW June 15, 2020 12 / 17

slide-52
SLIDE 52

Intuition of linear convergence

Under strict complementarity r⋆ = k⋆:

1 range(X⋆) = EV(∇f (X⋆)) 2 Compute V⋆ = [v1, . . . , vk⋆], the bottom eigenvectors of ∇f (X⋆). 3 X⋆ = V⋆S⋆V ⊤

⋆ for some S⋆ ∈ Sr⋆ +, tr(S) = 1

4 Obtain S⋆ by solving

minimize f (V⋆S⋆V ⊤

⋆ )

s.t. S ∈ Sr⋆

+, tr(S) = 1.

(reduced M)

5 Problem (M) is solved given ∇f (X⋆)!

SpecFW is simply algorithimic procedures for step 2 and 4!

Lijun Ding (Cornell University) SpecFW June 15, 2020 12 / 17

slide-53
SLIDE 53

Outline

1

Introduction Problem setup Past algorithms

2

SpecFW and strict complementarity Spectral Frank-Wolfe (SpecFW) Strict complementarity

3

Numerics Experimental setup Numerical results

Lijun Ding (Cornell University) SpecFW June 15, 2020 13 / 17

slide-54
SLIDE 54

Experimental setup: Quadratic sensing

Quadratic Sensing [CCG15]: recover a rank r♮ = 3 matrix U♮ ∈ Rn×r♮ with U♮2

F = 1 from quadratic measurement y ∈ Rm Lijun Ding (Cornell University) SpecFW June 15, 2020 14 / 17

slide-55
SLIDE 55

Experimental setup: Quadratic sensing

Quadratic Sensing [CCG15]: recover a rank r♮ = 3 matrix U♮ ∈ Rn×r♮ with U♮2

F = 1 from quadratic measurement y ∈ Rm 1 random standard gaussian measurements ai 2 y0(i) =

  • U⊤

♮ ai

  • 2

F , i = 1, . . . , m, m = 15nr♮ Lijun Ding (Cornell University) SpecFW June 15, 2020 14 / 17

slide-56
SLIDE 56

Experimental setup: Quadratic sensing

Quadratic Sensing [CCG15]: recover a rank r♮ = 3 matrix U♮ ∈ Rn×r♮ with U♮2

F = 1 from quadratic measurement y ∈ Rm 1 random standard gaussian measurements ai 2 y0(i) =

  • U⊤

♮ ai

  • 2

F , i = 1, . . . , m, m = 15nr♮ 3 y = y0 + c y02 v, c is the inverse signal-to-noise ratio, v is a

random unit vector

Lijun Ding (Cornell University) SpecFW June 15, 2020 14 / 17

slide-57
SLIDE 57

Experimental setup: Quadratic sensing

Quadratic Sensing [CCG15]: recover a rank r♮ = 3 matrix U♮ ∈ Rn×r♮ with U♮2

F = 1 from quadratic measurement y ∈ Rm 1 random standard gaussian measurements ai 2 y0(i) =

  • U⊤

♮ ai

  • 2

F , i = 1, . . . , m, m = 15nr♮ 3 y = y0 + c y02 v, c is the inverse signal-to-noise ratio, v is a

random unit vector Optimization problem: minimize f (X) := 1 2

m

  • i=1
  • a⊤

i Xai − yi

2 subject to tr(X) = τ, X 0. (Quadratic Sensing) Set τ = 1

2 and c = 0.5 in numerics.

Lijun Ding (Cornell University) SpecFW June 15, 2020 14 / 17

slide-58
SLIDE 58

Low rank solution and strict complementarity

Dimension n

  • Avg. gap
  • Avg. recovery error

100 288.06 0.0013 200 505.16 0.00064 400 961.09 0.00031 600 1358.62 0.00021

Table: Verification of low rankness and strict complementarity. Rank r⋆ = 3 in all

  • experiments. The recovery error is measured by

X⋆ τ −U♮U⊤ ♮ F

U♮U⊤

♮ F

. The gap is measured by λn−3(∇f (X⋆)) − λn(∇f (X⋆)). All the results are averaged over 20 iid trials.

Lijun Ding (Cornell University) SpecFW June 15, 2020 15 / 17

slide-59
SLIDE 59

Numerical results k > r⋆

20 40 60 80 100

Time(seconds)

  • 16
  • 14
  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

G-blockFW SpecFW FW

100 200 300 400 500

Iteration Counter

  • 16
  • 14
  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

G-blockFW SpecFW FW

Figure: k > r⋆. comparison of algorithms FW, G-blockFW [AZHHL17], and

  • SpecFW. Left: accuracy vs time. Right: accuracy vs iteration.

Lijun Ding (Cornell University) SpecFW June 15, 2020 16 / 17

slide-60
SLIDE 60

Numerical results k < r⋆

20 40 60 80 100

Time(seconds)

  • 3.5
  • 3
  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

G-blockFW SpecFW FW

100 200 300 400 500

Iteration Counter

  • 3.5
  • 3
  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

G-blockFW SpecFW FW

Figure: k < r⋆. comparison of algorithms FW, G-blockFW [AZHHL17], and

  • SpecFW. Left: accuracy vs time. Right: accuracy vs iteration.

Lijun Ding (Cornell University) SpecFW June 15, 2020 17 / 17

slide-61
SLIDE 61

References I

Ali Ahmed, Benjamin Recht, and Justin Romberg. Blind deconvolution using convex programming. IEEE Transactions on Information Theory, 60(3):1711–1732, 2013. Zeyuan Allen-Zhu, Elad Hazan, Wei Hu, and Yuanzhi Li. Linear convergence of a frank-wolfe type algorithm over trace-norm balls. In Advances in Neural Information Processing Systems, pages 6191–6200, 2017. Yuxin Chen, Yuejie Chi, and Andrea J Goldsmith. Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Transactions on Information Theory, 61(7):4034–4059, 2015.

Lijun Ding (Cornell University) SpecFW June 15, 2020 17 / 17

slide-62
SLIDE 62

References II

Emmanuel J Candes, Yonina C Eldar, Thomas Strohmer, and Vladislav Voroninski. Phase retrieval via matrix completion. SIAM review, 57(2):225–251, 2015. Emmanuel J Cand` es and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717, 2009. Mark A Davenport, Yaniv Plan, Ewout Van Den Berg, and Mary Wootters. 1-bit matrix completion. Information and Inference: A Journal of the IMA, 3(3):189–223, 2014.

Lijun Ding (Cornell University) SpecFW June 15, 2020 17 / 17

slide-63
SLIDE 63

References III

Robert M Freund, Paul Grigas, and Rahul Mazumder. An extended frank-wolfe method with “in-face” directions, and its application to low-rank matrix completion. SIAM Journal on optimization, 27(1):319–346, 2017. Dan Garber. Faster projection-free convex optimization over the spectrahedron. In Advances in Neural Information Processing Systems, pages 874–882, 2016. Dan Garber. Linear convergence of frank-wolfe for rank-one matrix recovery without strong convexity. arXiv preprint arXiv:1912.01467, 2019.

Lijun Ding (Cornell University) SpecFW June 15, 2020 17 / 17

slide-64
SLIDE 64

References IV

Martin Jaggi and Marek Sulovsk` y. A simple algorithm for nuclear norm regularized problems. 2010. Benjamin Recht, Maryam Fazel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471–501, 2010. Alp Yurtsever, Madeleine Udell, Joel Tropp, and Volkan Cevher. Sketchy decisions: Convex low-rank matrix optimization with optimal storage. In Artificial Intelligence and Statistics, pages 1188–1196, 2017.

Lijun Ding (Cornell University) SpecFW June 15, 2020 17 / 17