Non-Convex Relaxations for Rank Regularization Carl Olsson - - PowerPoint PPT Presentation

non convex relaxations for rank regularization
SMART_READER_LITE
LIVE PREVIEW

Non-Convex Relaxations for Rank Regularization Carl Olsson - - PowerPoint PPT Presentation

Non-Convex Relaxations for Rank Regularization Carl Olsson 2019-05-01 Carl Olsson 2019-05-01 1 / 40 Structure from Motion and Factorization W X X Affine camera model: P 1 P 2 X = X 1 X 2 . . . .


slide-1
SLIDE 1

Non-Convex Relaxations for Rank Regularization

Carl Olsson 2019-05-01

Carl Olsson 2019-05-01 1 / 40

slide-2
SLIDE 2

Structure from Motion and Factorization

W ⊙ X X Affine camera model: X =    P1 P2 . . .   

camera matrices

  • X1

X2 . . .

  • 3D points

Carl Olsson 2019-05-01 2 / 40

slide-3
SLIDE 3

General Motion/Deformation

. . . Linear shape basis assumption:  

0.1581 0.4714 −0.9782 2.0509 1.8610 −2.4750 −0.0366 −0.0468 0.2511 0.0532 0.2687 0.5076 0.5402 −1.9804 0.4749 −0.4343 2.0293 0.3569 . . . . . . . . . . . . . . . . . .

                                    =                

. . .

               

Carl Olsson 2019-05-01 3 / 40

slide-4
SLIDE 4

Rank and Factorization

X =

  • x1

x2 . . . xn

  • =
  • b1

b2 . . . br

  • B

(m×r)

     c11 c21 . . . c12 c21 . . . . . . . . . ... c1r c21 . . .      .

  • C T

(r×n)

rank(X) = r Factorization not unique: X = BC T = BG

  • ˜

B

G −1C T

˜ C T

. DOF: (m + n)r − r2 << mn Can reconstruct at most mn − ((m + n)r − r2) missing elements. Small DOF desirable! Incorporate as many constraints as possible.

Carl Olsson 2019-05-01 4 / 40

slide-5
SLIDE 5

Structure from Motion

Rigid reconstruction: Non-rigid version:

Carl Olsson 2019-05-01 5 / 40

slide-6
SLIDE 6

Low Rank Approximation

Find the best rank r0 approximation of X0: min

rank(X)=r0

X − X02

F

Eckart, Young (1936): Closed form solution via SVD: If X0 = n

i=1 σi(X0)uivT i

then X = r0

i=1 σi(X0)uivT i .

Alternative formulation: min

X µrank(X) + X − X02 F

Eckart, Young: σi(X) = σi(X0) if σi(X0) ≥ √µ

  • therwise

Carl Olsson 2019-05-01 6 / 40

slide-7
SLIDE 7

Low Rank Approximation

Generalizations: min g(rank(X)) + AX − b2 + C(X) No closed form solution. Non-convex. Discontinuous. Even local optimization can be difficult. Goal: Find ”flexible, easy to optimize” relaxations.

Carl Olsson 2019-05-01 7 / 40

slide-8
SLIDE 8

The Nuclear Norm Approach

Recht, Fazel, Parillo 2008. Replace rank(X) with X∗ = n

i=1 σi(X).

min

X µX∗ + AX − b2

Convex. Can be solved optimally. Shrinking bias. Not good for SfM! Closed form solution to minX µX∗ + X − X02

F :

If X0 = n

i=1 σi(X0)uivT i

then X = n

i=1 max

  • σi(X0) − µ

2 , 0

  • soft thresholding

uivT

i .

Carl Olsson 2019-05-01 8 / 40

slide-9
SLIDE 9

Just a few prior works

Low rank recovery via Nuclear Norm: Fazel, Hindi, Boyd. A rank minimization heuristic with application to minimum order system approximation. 2001. Cand` es, Recht. Exact matrix completion via convex optimization. 2009. Cand` es, Li, Ma, Wright. Robust principal component analysis? 2011. Non-convex approaches: Mohan, Fazel. Iterative reweighted least squares for matrix rank minimization. 2010. Pinghua, Zhang, Lu, Huang, Ye. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. 2013. Sparse signal recovery using the ℓ1 norm:

  • Tropp. Just relax: Convex programming methods for identifying sparse signals in noise.

2006. Cand` es, Romberg, Tao. Stable signal recovery from incomplete and inaccurate

  • measurements. 2006.

Cand` es, Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? 2006. Non-Convex approaches: Candes, Wakin, Boyd. Enhancing sparsity by reweighted ℓ1 minimization. 2008.

Carl Olsson 2019-05-01 9 / 40

slide-10
SLIDE 10

Our Approach

Replace µrank(X) with Rµ(σ(X)) =

i µ − max(√µ − σi(X), 0)2.

min

X Rµ(X) + AX − b2

Rµ continuous, but non-convex. The global minimizer does not change if A < 1. Rµ(σ(X)) + AX − b2 lower bound on µrank(X) + AX − b2. f ∗∗

µ (X) = Rµ(σ(X)) + X − X02 F is the convex envelope of

fµ(X) = µrank(X) + X − X02

F. Larsson, Olsson. Convex Low Rank Regularization. IJCV 2016.

Carl Olsson 2019-05-01 10 / 40

slide-11
SLIDE 11

Shrinking Bias

1D versions:

rank(X) =

i |σi(X)|0

X∗ =

i σi(X)

Rµ(σ(X)) =

i µ − [√µ − σi(X)]2 +

Singular value thresholding:

µrank(X) + X − X02

F

2√µX∗ + X − X02

F

Rµ(σ(X)) + X − X02

F Carl Olsson 2019-05-01 11 / 40

slide-12
SLIDE 12

More General Framework

Computation of the convex envelopes f ∗∗

g (X) = Rg(σ(X)) + X − X02 F

  • f

fg(X) = g(rank(X)) + X − X02

F

where g(k) = k

i=1 gi and 0 ≤ g1 ≤ g2 ≤ .... And proximal operators.

Another special case: fr0(X) = I(rank(X) ≤ r0) + X − X02

F Larsson, Olsson. Convex Low Rank Regularization. IJCV 2016.

Carl Olsson 2019-05-01 12 / 40

slide-13
SLIDE 13

Results

General Case If fg(X) = g(rank(X)) + X − X02

F

then f ∗∗

g (X) = max σ(Z)

n

  • i=1

min

  • gi, σ2

i (Z)

  • − σ(Z) − σ(X)2
  • + X − X02

F.

The maximization over Z reduces to a 1D-search. (piece-wise quadratic concave objective function) Can be done in O(n) time (n =number of singular values).

Carl Olsson 2019-05-01 13 / 40

slide-14
SLIDE 14

Convexity of f ∗∗

µ

√µ 2µ √µ 2µ

µ − √µ − σ 2

+ + σ2

µ − √µ − σ 2

+ + (σ − σ0)2 for σ0 = 0, 1, 2

If σ0 = 2 the function will not try to make σ = 0!

Carl Olsson 2019-05-01 14 / 40

slide-15
SLIDE 15

Interpretations: f ∗∗

r0

fr0(X) = I(rank(X) ≤ r0) + X − X02

F

Level set surfaces {X | Rr0(X) = α} for X = diag(x1, x2, x3) with r0 = 1 (Left) and r0 = 2 (Middle). Note that when r0 = 1 the regularizer promotes solutions where only one of xk is non-zero. For r0 = 2 the regularlizer instead favors solutions with two non-zero xk. For comparison we also include the level set of the nuclear norm.

Carl Olsson 2019-05-01 15 / 40

slide-16
SLIDE 16

Hankel Matrix Estimation

Signal Hankel Matrix Matrix+Noise min

H∈H I(rank(H) ≤ r0) + H − X02 F

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 noise σ H − H(f ) SVD Mean f ∗∗

µ

f ∗∗

r0

Nuclear norm 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.8 0.9 1 Mean f ∗∗

µ

f ∗∗

r0

Nuclear norm

Carl Olsson 2019-05-01 16 / 40

slide-17
SLIDE 17

Smooth Linear Shape Basis

fN(S) = S − S02

F + µP(S)∗ + τTV(S)

fR(S) = S − S02

F + Rµ(P(S)) + τTV(S) 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 100 120 140 160 noise σ S − SGTF Rµ Nuclear norm

Carl Olsson 2019-05-01 17 / 40

slide-18
SLIDE 18

RIP problems

Linear observations: b = AX0 + ǫ, A : Rm×n → Rp, X0 low rank, ǫ noise. Recover X0 using rank-penalties/constraints Rµ(σ(X)) + AX − b2

  • r

Rr0(σ(X)) + AX − b2 Restricted Isometry Property (RIP): (1 − δq)X2

F ≤ AX2 ≤ (1 + δq)X2 F,

rank(X) ≤ q

Olsson, Carlsson, Andersson, Larsson. Non-Convex Rank/Sparsity Regularization and Local

  • Minima. ICCV 2017.

Olsson, Carlsson, Bylow. A Non-Convex Relaxation for Fixed-Rank Approximation. RSL-CV 2017.

Carl Olsson 2019-05-01 18 / 40

slide-19
SLIDE 19

Near Convex Rank/Sparsity Estimation

Intuition: If RIP holds then AX2 behaves like X2

F.

Rµ(σ(X)) + X − X02

F ≈ Rµ(σ(X)) + X2 F − 2X, X0 is convex.

What about Rµ(σ(X)) + AX − b2 ≈ Rµ(σ(X)) + AX2 − 2X, A∗b? Near convex? 1D-example: Rµ(x) + ( 1

√ 2x − b)2

(µ = 1) b = 0 b =

1 √ 2

b = 1 b = √ 2 b = 1.5

Carl Olsson 2019-05-01 19 / 40

slide-20
SLIDE 20

Main Result (Rank Penalty)

  • Def. Fµ(X) := Rµ(σ(X)) + AX − b2

Z := (I − A∗A)Xs + A∗b Xs stationary point of Fµ(X) ⇔ Xs ∈ arg min

X

Rµ(σ(X)) + X − Z2

F.

X − Z2

F local approximation of AX − b2 around Xs.

Xs obtained by thresholding SVD of Z. Theorem If Xs is a stationary point of Fµ, and the singular values of Z fulfill σi(Z) / ∈ [(1 − δr)√µ,

√µ 1−δr ]. then for any another stationary point X ′ s we

have rank(X ′

s − Xs) > r.

Carl Olsson 2019-05-01 20 / 40

slide-21
SLIDE 21

Main Result (Rank Constraint)

  • Def. Fr0(X) := Rr0(σ(X)) + AX − b2

Z := (I − A∗A)Xs + A∗b Xs stationary point of Fr0(X) ⇔ Xs ∈ arg min

X

Rr0(σ(X)) + X − Z2

F.

X − Z2

F local approximation of AX − b2 around Xs.

Xs obtained by thresholding SVD of Z. Theorem If Xs is a stationary point of Fr0 with rank(Xs) = r0, and the singular values of Z fulfill σr0+1(Z) < (1 − 2δ2r0)σr0(Z) then any other stationary point X ′

s has rank(X ′ s) > r0.

Carl Olsson 2019-05-01 21 / 40

slide-22
SLIDE 22

Experiments (Rank Penalty)

Low rank recovery for varying noise level (x-axis) and regularization strength (y-axis). b = AX0 + ǫ, ǫi ∈ N(0, σ), where rank(X0) = 10. A: 2√µX∗ + AX − b2

F

rµ(σ(X)) + AX − b2

F

Verified optima 202 × 400 (δ = 0.2) 300 × 400 (δ un- known)

Carl Olsson 2019-05-01 22 / 40

slide-23
SLIDE 23

Experiments (Fixed Rank)

µX∗ + AX − b2

F vs. Rr(σ(X)) + AX − b2 F

(a) (b) (c)

0.2 0.4 0.6 0.8 1 5 10 15 20 25 Rr · ∗ 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 1.2 Rr 0.2 0.4 0.6 0.8 1 5 10 15 Rr · ∗

A − 400 × 400 with δ = 0.2 A − 300 × 400 (unknown δ)

(a) - Noise level (x-axis) vs. data fit AX − b (y-axis). (b) - Fraction of instances verified to be globally optimal. (c) - Same as (a). X ∈ R20×20 ⇒ vec(X) ∈ R400 (a) and (b) use 400 × 400 A with δ = 0.2 while (c) uses 300 × 400 A.

Carl Olsson 2019-05-01 23 / 40

slide-24
SLIDE 24

NRSfM

Reconstruct moving and deforming object from image projections. CMU Motion capture sequences.

Carl Olsson 2019-05-01 24 / 40

slide-25
SLIDE 25

NRSfM (Dai et al. 2012)

Xi, Yi, Zi: x-,y- and z-coordinats in image i. Ri: 2 × 3 matrix encoding orientation of camera i (affine model). R =      R1 . . . R2 . . . . . . . . . ... . . . . . . RF      , X =            X1 Y1 Z1 . . . XF YF ZF            and X # =    X1 Y1 Z1 . . . . . . . . . XF YF ZF    . Projections: M ≈ RX Linear shape basis assumption: rank(X #) ≤ r. min

X # I(rank(X #) ≤ r0) + AX # − M2 F,

A : X # → RX.

Carl Olsson 2019-05-01 25 / 40

slide-26
SLIDE 26

MOCAP Experiments

CMU Motion capture sequences: Drink Pick-up Stretch Yoga

Carl Olsson 2019-05-01 26 / 40

slide-27
SLIDE 27

MOCAP Experiments

Rr0(X #) + RX − M2

F (blue) vs.

µX #∗ + RX − M2

F (orange)

Data fit RX − MF (y-axis) versus rank(X #) (x-axis).

Carl Olsson 2019-05-01 26 / 40

slide-28
SLIDE 28

MOCAP Experiments

Rr0(X #) + RX − M2

F (blue) vs.

µX #∗ + RX − M2

F (orange)

Data fit RX − MF (y-axis) versus rank(X #) (x-axis). Distance to ground truth X − XgtF (y-axis) versus rank(X #) (x-axis).

Carl Olsson 2019-05-01 27 / 40

slide-29
SLIDE 29

MOCAP Experiments

RIP does not hold for A(X #) = RX! If RiNi = 0, Ni ∈ R3×1 then RiNiCi = 0, ∀Ci ∈ R1×m . Therefore A(N(C)) = 0 for any matrix of the form N(C) =      n11C1 n21C1 n31C1 n12C2 n22C2 n32C2 . . . . . . . . . n1FCF n2FCF n3FCF      , where nij are the elements of Ni. N(C) does not affect the projections. If the row space of an optimal solution contains N(C) (for some C) the solution is not unique.

Carl Olsson 2019-05-01 28 / 40

slide-30
SLIDE 30

MOCAP Experiments

Rr0(X #) + RX − M2

F + DX #2 F (blue) vs.

µX #∗ + RX − M2

F + DX #2 F (orange)

Data fit RX − MF (y-axis) versus rank(X #) (x-axis). Distance to ground truth X − XgtF (y-axis) versus rank(X #) (x-axis).

Carl Olsson 2019-05-01 29 / 40

slide-31
SLIDE 31

Optimization via ADMM/Splitting

Rµ(X) not differentiable X. Rµ(X) + AX − b2 (near) convex in X. Proximal operator arg minX Rµ(X)+ρX −X02

F computable.

Can apply splitting schemes: L(X, Y , Λ) = Rµ(X) + ρX − Y + Λ2

F + AY − b2 − ρΛi2 F.

Alternate: Xt+1 = arg min

X

Rµ(X) + ρX − Yt + Λt2

F,

Yt+1 = arg min

Y

ρXt+1 − Y + Λt2

F + AY − b2,

Λt+1 = Λt + Xt+1 − Yt+1.

First order method.

Carl Olsson 2019-05-01 30 / 40

slide-32
SLIDE 32

Quadratic Approximation

Reformulate into differentiable objective. Bilinear parameterization: min

B,C

˜ Rµ(B, C) + A(BC T) − b2, ˜ Rµ(B, C) two times differentiable a.e. Optimize with 2nd order methods. Introduces non-optimal stationary points. Characterization of local minima under RIP and optimization with VarPro/Wiberg.

Valtonen-¨ Ornhag, Olsson, Heyden. Bilinear Parameterization for Differentiable Rank Regularization, arXiv 2018.

Carl Olsson 2019-05-01 31 / 40

slide-33
SLIDE 33

Bilinear Parameterization

Assumption:R(X) = r

i=1 f (σi(X)), f concave, non-decreasing on [0, ∞).

Theorem R(X) = minBC T =X ˜ R(B, C), where ˜ R(B, C) =

k

  • i=1

f Bi2 + Ci2 2

  • ,

and Bi,Ci columns of B,C. ˜ R(B, C) differentiable if f (σ) = h(|σ|) where h is differentiable. SCAD: Log: MCP: ETP: Geman:

Carl Olsson 2019-05-01 32 / 40

slide-34
SLIDE 34

Bilinear Parameterization

If X = UΣV T, B = U √ Σ, C = V √ Σ then Bi = √σiUi and Ci = √σiVi. Therefore Bi2 + Ci2 2 = σiUi2 + σiVi2 2 = σi.

Carl Olsson 2019-05-01 33 / 40

slide-35
SLIDE 35

Uniqueness of Low-Rank-Minima

Over parameterization with our relaxation: Theorem Let ( ¯ B, ¯ C) ∈ Rm×2k × Rn×2k be a local minimizer of ˜ Rµ(B, C) + A(BC T) − b2, with rank( ¯ B ¯ C T) < k and ˜ Rµ( ¯ B, ¯ C) = Rµ( ¯ B ¯ C T). If the singular values of Z = (I − A∗A) ¯ B ¯ C T + A∗b fulfill σi(Z) / ∈ [(1 − δ2k)√µ,

√µ (1−δ2k)] then

¯ B ¯ C T ∈ arg min

rank(X)≤k

Rµ(X) + AX − b2.

Carl Olsson 2019-05-01 34 / 40

slide-36
SLIDE 36

VarPro/Wiberg

min

B,C A(BC T) − b2

Least Squares problem in C for fixed B. ⇒ compute (closed form) C ∗(B) = arg min

C

A(BC T) − b2. Linearize A(BC ∗(B)T) − b ≈ LδB + ℓ at Bk. Solve δBk = arg min LδB + ℓ2 + λδB2 Bk+1 = Bk + δBk. Quadratic approximation. Rapid convergence. Similar to GN/LM.

Carl Olsson 2019-05-01 35 / 40

slide-37
SLIDE 37

Regweighted VarPro

min

B,C

  • i

f Bi2 + Ci2 2

  • + A(BC T) − b2

Taylor: f (x) ≈ f (xk) + f ′(xk)(x − xk) = f ′(xk)x + const min

B,C

  • i

wk

i

Bi2 + Ci2 2

  • + A(BC T) − b2,

wk

i = f ′ Bk

i 2+C k i 2

2

  • .

One iteration of regular VarPro, recompute weights. Refactorize into Bk+1, C k+1 using SVD.

Carl Olsson 2019-05-01 36 / 40

slide-38
SLIDE 38

SfM-results

Carl Olsson 2019-05-01 37 / 40

slide-39
SLIDE 39

VarPro vs. ADMM

Carl Olsson 2019-05-01 38 / 40

slide-40
SLIDE 40

MOCAP Results

Drink Pickup Stretch Yoga

Carl Olsson 2019-05-01 39 / 40

slide-41
SLIDE 41

The End

Carl Olsson 2019-05-01 40 / 40

slide-42
SLIDE 42

Convex Envelopes and Conjugate Functions

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Carl Olsson 2019-05-01 40 / 40

slide-43
SLIDE 43

Convex Envelopes and Conjugate Functions

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 −f*(y) 1 y

f ∗(y) = max

x

yx − f (x)

Carl Olsson 2019-05-01 40 / 40

slide-44
SLIDE 44

Convex Envelopes and Conjugate Functions

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 −f*(y) 1 y

f ∗(y) = max

x

yx − f (x) ⇒ f ∗(y) ≥ yx − f (x) ⇒ f (x) ≥ yx − f ∗(y)

Carl Olsson 2019-05-01 40 / 40

slide-45
SLIDE 45

Convex Envelopes and Conjugate Functions

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 −f*(y) 1 y −f*(y2) 1 y2

f ∗(y) = max

x

yx − f (x) ⇒ f ∗(y) ≥ yx − f (x) ⇒ f (x) ≥ yx − f ∗(y)

Carl Olsson 2019-05-01 40 / 40

slide-46
SLIDE 46

Convex Envelopes and Conjugate Functions

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1

f ∗(y) = max

x

yx − f (x) ⇒ f ∗(y) ≥ yx − f (x) ⇒ f (x) ≥ yx − f ∗(y)

Carl Olsson 2019-05-01 40 / 40

slide-47
SLIDE 47

Convex Envelopes and Conjugate Functions

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1

f ∗∗(x) = max

y

xy − f ∗(y)

Carl Olsson 2019-05-01 40 / 40

slide-48
SLIDE 48

Computing the Conjugate

f ∗

g (X) = max Y X, Y − fg(X)

= max

k

k

  • i=1

gk + max

rank(Y )=kX, Y − X − X02 F

Inner maximization can be solved with SVD (Eckart, Young) and completion of squares.

Carl Olsson 2019-05-01 40 / 40

slide-49
SLIDE 49

The Missing Data Problem

W ⊙ (X − M)2

F

Red - visible element. Blue - missing measurement. No known closed for solution.

Carl Olsson 2019-05-01 40 / 40

slide-50
SLIDE 50

A Block Decomposition Approach

Solve the problem on sub-blocks with no missing data: f (X) =

K

  • i=1

µirank(Pi(X)) + Pi(X) − Pi(M)2

F

Convex relaxation: ˜ f (X) =

K

  • i=1

Rµi(σ(Pi(X))) + Pi(X) − Pi(M)2

F

Carl Olsson 2019-05-01 40 / 40

slide-51
SLIDE 51

A Block Decomposition Approach

X11 X12 ? X21 X22 X23 ? X32 X33 X = Lemma Let X1 and X2 be two given matrices with overlap matrix X22, and let r1 = rank(X1) and r2 = rank(X2). Suppose that rank(X22) = min(r1, r2), then there exists a matrix X with rank(X) = max(r1, r2). Additionally if rank(X22) = r1 = r2 then X is unique.

Carl Olsson 2019-05-01 40 / 40

slide-52
SLIDE 52

Results

Observed: Our Solution: Nuclear Norm: Ground Truth:

Carl Olsson 2019-05-01 40 / 40