Low-Rank Inducing Norms with Optimality Interpretations LU - - PowerPoint PPT Presentation

low rank inducing norms with optimality interpretations
SMART_READER_LITE
LIVE PREVIEW

Low-Rank Inducing Norms with Optimality Interpretations LU - - PowerPoint PPT Presentation

Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus Giselsson, Anders Rantzer June15 Automatic Control, Lund University Low-Rank Inducing Norms Problem Grussler, Giselsson, Rantzer Problem &


slide-1
SLIDE 1

LU 2017 June15

Low-Rank Inducing Norms with Optimality Interpretations

Christian Grussler Pontus Giselsson, Anders Rantzer

Automatic Control, Lund University

slide-2
SLIDE 2

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Problem

minimize

X∈Rm×n

k(X) + h(X) subject to rank(X) ≤ r

1 k : R≥0 → R is an increasing, convex, proper, closed

function

2 · is a unitarily invariant norm 3 h : Rm×n → R is a closed, proper, convex function

Vector-valued problems: minimize

x∈Rn

k(diag(x)) + h(x) subject to rank(diag(x))

  • card(x)

≤ r

slide-3
SLIDE 3

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Example: Bilinear Regression §

Given Y ∈ Rm×n, L ∈ Rk×m, R ∈ Rn×k, k ≤ min{m, n} minimize

X∈Rk×k

Y − LTXRT2

ℓ2

subject to rank(X) ≤ r where

  • X, Y ∈ Rm×n : X, Y = trace(XTY ).
  • Xℓ2 =
  • X, X =
  • i σ2

i (X)

§I.S. Dhillon ’15

slide-4
SLIDE 4

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

By assumption rank(LTXRT

  • =:M

) = rank(X) minimize

M

M2

ℓ2 k(M)

−2Y, M + I{M=LTXRT: X∈Rk×k}(M)

  • h(M)

subject to rank(M) ≤ r Applications:

  • Machine Learning: Principle Component Analysis,

Multivariate Linear Regression, Data Compression, ...

  • Control: Model Reduction, System Identification, ...
slide-5
SLIDE 5

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Explicit Solution: argmin

rank(X)≤r

Y − LTXRT2

ℓ2 = {L†YrR† : Yr ∈ svdr(Y )}

svdr(Y ) := r

  • i=1

σi(Y )uivT

i : Y = q

  • i=1

σi(Y )uivT

i is SVD of Y

  • with σ1(Y ) ≥ · · · ≥ σq(Y )
slide-6
SLIDE 6

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Problem: Convex structural constraints? minimize

X

Y − LTXRT2

ℓ2 + ˜

h(X) subject to rank(X) ≤ r Examples:

  • Nonnegative approximation: ˜

h(X) = IRk×k

≥0 (X).

  • Hankel approximation: ˜

h(X) = IHankel(X).

  • Feasibility problems: Y = 0 and ˜

h(X) = IC(X). Generally, no closed-form solutions are known!

slide-7
SLIDE 7

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Nuclear Norm Regularization

Standard approach today: Replace rank by nuclear norm § minimize

X

k(X) + h(X) subject to Xℓ1 ≤ λ

  • Xℓ1 =

i σi(X)

  • λ ≥ 0 is fixed.

§Tibshirani, Chen, Donoho, Fazel, Boyd,...

slide-8
SLIDE 8

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Pros:

  • Simple and generic heuristic

= ⇒ No PhD needed!

  • Probabilistic success guarantees §

minimize

X

rank(X) subject to A(X) = y = ⇒ minimize

X

Xℓ1 subject to A(X) = y

§Cand`

es, Tao, Recht, Fazel, Parrilo, Chandrasekaran, ...

slide-9
SLIDE 9

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Baboon Approximation

minimize

X

Y − X2

ℓ2 + IRm×n

≥0 (X)

subject to rank(X) ≤ r 1 20 40 60 80 0.05 0.1 0.15 0.2 0.25 0.3 rank A − (·)ℓ2 Aℓ2

slide-10
SLIDE 10

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

minimize

X

k(X) + h(X) + λXℓ1

bias

Cons:

  • Bias =

⇒ May not solve the non-convex problem, e.g., Low-rank approximation

  • No a posteriori check if the non-convex problem is solved
  • Deterministic structure?
  • Requires to sweep over a regularization parameter

⇒ Cross-validation Goal of this talk: Fix it for our problem class!

slide-11
SLIDE 11

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Modifications

Replace · ℓ1 with · s § minimize

X

k(X) + h(X) + λXs

bias

. Problem: Nothing really changed!

§Argyriou, Bach, Chandrasekaran, Eriksson, Mairal, Obozinski,...

slide-12
SLIDE 12

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Convex Envelope

min

X

f(X) = min

X

f∗∗(X)

f∗∗

X f∗∗(X) = (f∗)∗(X) f(X) ≥ f∗∗(X) Problem:

  • k( · ) + Irank(·)≤r + h

∗∗ unknown!

slide-13
SLIDE 13

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Old idea §

Replace k( · ) + Irank(·)≤r(·) with

  • k( · ) + Irank(·)≤r

∗∗ Fact:

k( · ) + Irank(·)≤r ∗∗ = k

  • · + Irank(·)≤r

∗∗

§Lemar´

echal 1973: minx

  • i fi(xi) → minx
  • i f ∗∗

i (xi)

slide-14
SLIDE 14

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Low-Rank Inducing Norms

Xg := g(σ1(X), . . . , σmin{m,n}(X)) Example: Xℓ2 − → g(x) = xℓ2 Xℓ1 − → g(x) = xℓ1

slide-15
SLIDE 15

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Dual norm Y gD := sup

Xg≤1

X, Y = gD(σ1(Y ), . . . , σmin{m,n}(Y )) Examples: Y ℓD

2 = Y ℓ2

Y ℓD

1 = Y ℓ∞ = σ1(Y )

slide-16
SLIDE 16

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Truncated dual norms Y gD,r := sup

Xg≤1

rank(X)≤r

X, Y = gD(σ1(Y ), . . . , σr(Y ))

  • =gD(σ1(Y ),...,σr(Y ),0...,0)

Examples: Y ℓD

2 ,r =

  • r
  • i=1

σ2

i (Y )

Y ℓD

1 ,r = Y ℓ∞

slide-17
SLIDE 17

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Low-rank inducing norms § Xg,r∗ := sup

Y gD,r≤1

X, Y .

  • If · g SDP representable =

⇒ · g,r∗ SDP repres.

  • If prox·g computable

= ⇒ prox·g,r∗ computable = ⇒ proxI·g,r∗≤t(·, t) computable = ⇒ k( · g,r∗) = min

t

k(t) + I·g,r∗≤t(·, t) Complexity for g = ℓ2, ℓ∞: SVD + O(n log n) (n = # SVs)

§Atomic norms, Overlapping norms, Support norms

slide-18
SLIDE 18

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Geometric Interpretation

B1

g,r∗ := {X ∈ Rm×n : Xg,r∗ ≤ 1}

Eg,r := {X ∈ Rm×n : Xg = 1, rank(X) ≤ r}

  • B1

g,r∗ = conv(Eg,r)

  • Xg ≤ Xg,r∗
  • Xg = Xg,r∗, rank(X) ≤ r.

B1

g,r∗

slide-19
SLIDE 19

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

minimize

X

Xg subject to A(X) = y, rank(X) ≤ r ⇔ minimize

X

Xg,r∗ subject to A(X) = y, rank(X) ≤ r

A(X) = y

slide-20
SLIDE 20

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

minimize

X

Xg subject to A(X) = y, rank(X) ≤ r ⇔ minimize

X

Xg,r∗ subject to A(X) = y, rank(X) ≤ r

A(X) = y

slide-21
SLIDE 21

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Best Convex Relaxation

min

X∈Rm×n

rank(X)≤r

[k(Xg) + h(X)] ≥ min

X∈Rm×n [k(Xg,r∗) + h(X)]

Best in the sense:

  • (k( · g) + Irank(·)≤r(·) + h)∗∗ unknown
  • Simple a posteriori test for optimality
  • Sweep over discrete r instead of λ

= ⇒ Cross-validation ← → zero-duality gap Cost function replaced – NO BIAS!

slide-22
SLIDE 22

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Nuclear Norm

Standard interpretation: · ℓ1 = (rank(·) + I·ℓ∞≤1)∗∗ Our interpretation # 1: · ℓ1 = ( · ℓ1 + Irank(·)≤r)∗∗ Our interpretation # 2: Xℓ1 = Xg,1∗ ≥ · · · ≥ Xg,r∗ ≥ . . . ≥ Xg,q∗ = Xg min

X∈Rm×n

rank(X)≤1

[k(Xg) + h(X)] ≥ min

X∈Rm×n [k(Xℓ1) + h(X)]

slide-23
SLIDE 23

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Some good news

  • Zero-duality gap for bilinear regression

minimize

X∈Rk×k

Y − LTXRT2

ℓ2

subject to rank(X) ≤ r

  • Optimality interpretations, e.g., iterative re-weighting

min

X∈Rm×n

rank(X)≤r

[k(WXg) + h(X)] ≥ min

X∈Rm×n [k(WXg,r∗) + h(X)]

slide-24
SLIDE 24

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

  • Extends to atomic sets

min

x∈A [k(G(x)) + h(x)] ≥ min x [k(xAG) + h(x)]

  • G is positively homogeneous
  • ∀a ∈ A \ {0} : G(a) > 0
  • xAG = inf{t > 0 : t−1x ∈ conv(AG)}
  • AG = {a ∈ cone(A) : G(a) = 1}

Example: · ℓ2,r∗ − → G = · ℓ2, A = {X : rank(X) ≤ r}

slide-25
SLIDE 25

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Not bad news

X⋆ ∈ argmin

X

[k(Xg,r∗) + h(X)] Y ⋆ ∈ argmin

Y

  • k+(Y gD,r) + h∗(Y )
  • k+(y) := supx≥0[xy − k(x)]
  • rank(X⋆) ≤ r + uniqueness,

σr(Y ⋆) = σr+1(Y ⋆) or σr(Y ⋆) = 0

  • rank(X⋆) ≤ r + s,

σr(Y ⋆) = · · · = σr+s(Y ⋆) = σr+s+1(Y ⋆)

slide-26
SLIDE 26

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Recovery Guarantees?

  • Work in progress
  • Why not using known tools? §

Do not exploit additional ”knowledge” provided by · g

§Chandrasekaran, Recht, Parrilo, Willsky ’12

slide-27
SLIDE 27

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Example: Matrix Completion

Given partially known entries of a low-rank Z ∈ Rm×n, find the unknown entries. Additional knowledge: minimize

X

Xg subject to Xij = Zij, (i, j) ∈ I rank(X) ≤ r

  • Small unknown entries: k( · ) = · ℓ2.
slide-28
SLIDE 28

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Example: Matrix Completion

H =                 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1                 ∈ R10×10

slide-29
SLIDE 29

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Z :=

5

  • i=1

σi(H)uiuT

i

|Zij − Hij| ≤ σ6(H) = ⇒ ∀ Hij = 0 : |Zij| ≤ σ6(H) Z =                 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ? ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ? ? ∗ ∗ ∗ ∗ ∗ ∗ ∗ ? ? ? ∗ ∗ ∗ ∗ ∗ ∗ ? ? ? ? ∗ ∗ ∗ ∗ ∗ ? ? ? ? ? ∗ ∗ ∗ ∗ ? ? ? ? ? ? ∗ ∗ ∗ ? ? ? ? ? ? ? ∗ ∗ ? ? ? ? ? ? ? ? ∗ ? ? ? ? ? ? ? ? ?                 ∈ R10×10

slide-30
SLIDE 30

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Z =                 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ? ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ? ? ∗ ∗ ∗ ∗ ∗ ∗ ∗ ? ? ? ∗ ∗ ∗ ∗ ∗ ∗ ? ? ? ? ∗ ∗ ∗ ∗ ∗ ? ? ? ? ? ∗ ∗ ∗ ∗ ? ? ? ? ? ? ∗ ∗ ∗ ? ? ? ? ? ? ? ∗ ∗ ? ? ? ? ? ? ? ? ∗ ? ? ? ? ? ? ? ? ?                 ∈ R10×10

  • 45 unknown entries (not randomly selected!)
  • Recovery guarantees with nuclear norm §:

3r(2n − r) + 1 = 226 random Gaussian samples

§Chandrasekaran, Recht, Parrilo, Willsky ’12

slide-31
SLIDE 31

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

minimize

X

Xℓ2,r∗ subject to ∀(i, j) ∈ I : Xij = Zij. 1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 r Z − (·)ℓ2 Zℓ2

slide-32
SLIDE 32

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

1 2 3 4 5 6 7 8 9 10 5 6 7 8 9 10 r rank

slide-33
SLIDE 33

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

minimize

X

1 2X2

ℓ2 + µXℓ1 §

subject to Xij = Zij, (i, j) ∈ I 2 4 6 8 10 9 10 11 µ rank

§Cai, Cand`

es ’10

slide-34
SLIDE 34

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Conclusion

  • Simple a posterior test for optimality
  • Prior information can/should be utilized

= ⇒ Model the non-convex problem

  • Handles structured measurements
  • Can be used to test performance of greedy methods

Most important: Replace – Don’t add!

slide-35
SLIDE 35

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

What I did not show you

  • One can let r become real-valued through defining:

XgD,r = gD(σ1(X), . . . , σ⌊r⌋(X), (r − ⌊r⌋)σ⌈r⌉(X)).

  • Non-convex proximal splitting: Xk = proxγf1(Zk)

f1 = k( · g) + Irank(·)≤r

  • σr(Y ⋆) = σr+1(Y ⋆): Local convergence to global minima
  • All σr(Y ⋆) = σr+1(Y ⋆): All stationary points correspond

to global minima (= ⇒ Panos: global convergence)

slide-36
SLIDE 36

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Future Work

  • Application to more control problems (Anders H., Mihailo)
  • Can we learn a suitable norm? (Yong Sheng?)
  • A priori deterministic and probabilistic guarantees (?)
slide-37
SLIDE 37

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms

Sources

  • Low-Rank Inducing Norms with Optimality Interpretations.
  • Low-Rank Optimization with Convex Constraints.
  • PhD-thesis: Rank Reducing with Convex Constraints.
  • The Use of the r∗ Heuristic in Covariance Completion

Problems.

  • Local Convergence of Proximal Splitting Methods for

Rank Constrained Problems

Questions?