low rank inducing norms with optimality interpretations
play

Low-Rank Inducing Norms with Optimality Interpretations LU - PowerPoint PPT Presentation

Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus Giselsson, Anders Rantzer June15 Automatic Control, Lund University Low-Rank Inducing Norms Problem Grussler, Giselsson, Rantzer Problem &


  1. Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus Giselsson, Anders Rantzer June15 Automatic Control, Lund University

  2. Low-Rank Inducing Norms Problem Grussler, Giselsson, Rantzer Problem & minimize k ( � X � ) + h ( X ) Motivation X ∈ R m × n Low-Rank subject to rank ( X ) ≤ r Inducing Norms 1 k : R ≥ 0 → R is an increasing, convex, proper, closed function 2 � · � is a unitarily invariant norm 3 h : R m × n → R is a closed, proper, convex function Vector-valued problems: minimize k ( � diag ( x ) � ) + h ( x ) x ∈ R n subject to rank ( diag ( x )) ≤ r � �� � card ( x )

  3. Low-Rank Inducing Norms Example: Bilinear Regression § Grussler, Giselsson, Rantzer Problem & Motivation Given Y ∈ R m × n , L ∈ R k × m , R ∈ R n × k , k ≤ min { m, n } Low-Rank Inducing Norms � Y − L T XR T � 2 minimize ℓ 2 X ∈ R k × k subject to rank ( X ) ≤ r where • X, Y ∈ R m × n : � X, Y � = trace ( X T Y ) . �� � i σ 2 • � X � ℓ 2 = � X, X � = i ( X ) § I.S. Dhillon ’15

  4. Low-Rank Inducing Norms By assumption rank ( L T XR T ) = rank ( X ) Grussler, � �� � Giselsson, =: M Rantzer Problem & � M � 2 Motivation minimize − 2 � Y, M � + I { M = L T XR T : X ∈ R k × k } ( M ) ℓ 2 M Low-Rank � �� � � �� � Inducing k ( � M � ) h ( M ) Norms subject to rank ( M ) ≤ r Applications: • Machine Learning: Principle Component Analysis, Multivariate Linear Regression, Data Compression, ... • Control: Model Reduction, System Identification, ...

  5. Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Explicit Solution: Problem & Motivation ℓ 2 = { L † Y r R † : Y r ∈ svd r ( Y ) } � Y − L T XR T � 2 argmin Low-Rank Inducing rank ( X ) ≤ r Norms � r � q � � σ i ( Y ) u i v T σ i ( Y ) u i v T svd r ( Y ) := i : Y = i is SVD of Y i =1 i =1 with σ 1 ( Y ) ≥ · · · ≥ σ q ( Y )

  6. Low-Rank Inducing Norms Problem: Convex structural constraints? Grussler, Giselsson, Rantzer ℓ 2 + ˜ � Y − L T XR T � 2 minimize h ( X ) Problem & X Motivation subject to rank ( X ) ≤ r Low-Rank Inducing Norms Examples: • Nonnegative approximation: ˜ h ( X ) = I R k × k ≥ 0 ( X ) . • Hankel approximation: ˜ h ( X ) = I Hankel ( X ) . • Feasibility problems: Y = 0 and ˜ h ( X ) = I C ( X ) . Generally, no closed-form solutions are known!

  7. Low-Rank Inducing Norms Nuclear Norm Regularization Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Standard approach today: Replace rank by nuclear norm § Inducing Norms minimize k ( � X � ) + h ( X ) X subject to � X � ℓ 1 ≤ λ • � X � ℓ 1 = � i σ i ( X ) • λ ≥ 0 is fixed. § Tibshirani, Chen, Donoho, Fazel, Boyd,...

  8. Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Pros: Problem & Motivation Low-Rank • Simple and generic heuristic = ⇒ No PhD needed! Inducing Norms • Probabilistic success guarantees § minimize rank ( X ) minimize � X � ℓ 1 X = ⇒ X subject to A ( X ) = y subject to A ( X ) = y § Cand` es, Tao, Recht, Fazel, Parrilo, Chandrasekaran, ...

  9. Low-Rank Inducing Norms Baboon Approximation Grussler, Giselsson, Rantzer Problem & � Y − X � 2 Motivation minimize ℓ 2 + I R m × n ≥ 0 ( X ) X Low-Rank Inducing subject to rank ( X ) ≤ r Norms 0 . 3 � A − ( · ) � ℓ 2 0 . 25 � A � ℓ 2 0 . 2 0 . 15 0 . 1 0 . 05 1 20 40 60 80 rank

  10. Low-Rank Inducing Norms Grussler, minimize k ( � X � ) + h ( X ) + λ � X � ℓ 1 Giselsson, X Rantzer � �� � bias Problem & Motivation Cons: Low-Rank Inducing • Bias = Norms ⇒ May not solve the non-convex problem, e.g., Low-rank approximation • No a posteriori check if the non-convex problem is solved • Deterministic structure? • Requires to sweep over a regularization parameter ⇒ Cross-validation Goal of this talk: Fix it for our problem class!

  11. Low-Rank Inducing Norms Modifications Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms Replace � · � ℓ 1 with � · � s § minimize k ( � X � ) + h ( X ) + λ � X � s . X � �� � bias Problem: Nothing really changed! § Argyriou, Bach, Chandrasekaran, Eriksson, Mairal, Obozinski,...

  12. Low-Rank Inducing Norms Convex Envelope Grussler, Giselsson, Rantzer Problem & Motivation f ∗∗ ( X ) min f ( X ) = min Low-Rank Inducing X X Norms f ∗∗ f ∗∗ ( X ) = ( f ∗ ) ∗ ( X ) f ( X ) ≥ f ∗∗ ( X ) X � ∗∗ unknown! � Problem: k ( � · � ) + I rank ( · ) ≤ r + h

  13. Low-Rank Inducing Norms Old idea § Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Replace k ( � · � ) + I rank ( · ) ≤ r ( · ) with Norms � � ∗∗ k ( � · � ) + I rank ( · ) ≤ r Fact: � ∗∗ = k • � �� � ∗∗ � k ( � · � ) + I rank ( · ) ≤ r � · � + I rank ( · ) ≤ r § Lemar´ i f ∗∗ echal 1973: min x � i f i ( x i ) → min x � i ( x i )

  14. Low-Rank Inducing Norms Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms � X � g := g ( σ 1 ( X ) , . . . , σ min { m,n } ( X )) Example: � X � ℓ 2 − → g ( x ) = � x � ℓ 2 � X � ℓ 1 − → g ( x ) = � x � ℓ 1

  15. Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Dual norm Problem & Motivation � X, Y � = g D ( σ 1 ( Y ) , . . . , σ min { m,n } ( Y )) � Y � g D := sup Low-Rank � X � g ≤ 1 Inducing Norms Examples: � Y � ℓ D 2 = � Y � ℓ 2 � Y � ℓ D 1 = � Y � ℓ ∞ = σ 1 ( Y )

  16. Low-Rank Inducing Norms Grussler, Giselsson, Truncated dual norms Rantzer � X, Y � = g D ( σ 1 ( Y ) , . . . , σ r ( Y )) Problem & � Y � g D ,r := sup Motivation � �� � � X � g ≤ 1 = g D ( σ 1 ( Y ) ,...,σ r ( Y ) , 0 ..., 0) Low-Rank rank ( X ) ≤ r Inducing Norms Examples: � � r � � σ 2 � Y � ℓ D 2 ,r = i ( Y ) � i =1 � Y � ℓ D 1 ,r = � Y � ℓ ∞

  17. Low-Rank Inducing Low-rank inducing norms § Norms Grussler, Giselsson, � X � g,r ∗ := sup � X, Y � . Rantzer � Y � gD,r ≤ 1 Problem & Motivation Low-Rank • If � · � g SDP representable = ⇒ � · � g,r ∗ SDP repres. Inducing Norms • If prox �·� g computable = ⇒ prox �·� g,r ∗ computable = ⇒ prox I �·� g,r ∗≤ t ( · , t ) computable = ⇒ k ( � · � g,r ∗ ) = min k ( t ) + I �·� g,r ∗ ≤ t ( · , t ) t Complexity for g = ℓ 2 , ℓ ∞ : SVD + O ( n log n ) ( n = # SVs) § Atomic norms, Overlapping norms, Support norms

  18. Low-Rank Inducing Norms Geometric Interpretation Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank g,r ∗ := { X ∈ R m × n : � X � g,r ∗ ≤ 1 } B 1 Inducing Norms E g,r := { X ∈ R m × n : � X � g = 1 , rank ( X ) ≤ r } • B 1 g,r ∗ = conv ( E g,r ) B 1 • � X � g ≤ � X � g,r ∗ g,r ∗ • � X � g = � X � g,r ∗ , rank ( X ) ≤ r .

  19. Low-Rank Inducing Norms Grussler, minimize � X � g minimize � X � g,r ∗ Giselsson, Rantzer X X ⇔ subject to A ( X ) = y, subject to A ( X ) = y, Problem & rank ( X ) ≤ r rank ( X ) ≤ r Motivation Low-Rank Inducing Norms A ( X ) = y

  20. Low-Rank Inducing Norms Grussler, minimize � X � g minimize � X � g,r ∗ Giselsson, Rantzer X X ⇔ subject to A ( X ) = y, subject to A ( X ) = y, Problem & rank ( X ) ≤ r rank ( X ) ≤ r Motivation Low-Rank Inducing Norms A ( X ) = y

  21. Low-Rank Inducing Norms Best Convex Relaxation Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank min [ k ( � X � g ) + h ( X )] ≥ X ∈ R m × n [ k ( � X � g,r ∗ ) + h ( X )] min Inducing X ∈ R m × n Norms rank ( X ) ≤ r Best in the sense: • ( k ( � · � g ) + I rank ( · ) ≤ r ( · ) + h ) ∗∗ unknown • Simple a posteriori test for optimality • Sweep over discrete r instead of λ = ⇒ Cross-validation ← → zero-duality gap Cost function replaced – NO BIAS!

  22. Low-Rank Inducing Norms Nuclear Norm Grussler, Giselsson, Rantzer Standard interpretation: Problem & Motivation Low-Rank � · � ℓ 1 = ( rank ( · ) + I �·� ℓ ∞≤ 1 ) ∗∗ Inducing Norms Our interpretation # 1: � · � ℓ 1 = ( � · � ℓ 1 + I rank ( · ) ≤ r ) ∗∗ Our interpretation # 2: � X � ℓ 1 = � X � g, 1 ∗ ≥ · · · ≥ � X � g,r ∗ ≥ . . . ≥ � X � g,q ∗ = � X � g min [ k ( � X � g ) + h ( X )] ≥ X ∈ R m × n [ k ( � X � ℓ 1 ) + h ( X )] min X ∈ R m × n rank ( X ) ≤ 1

  23. Low-Rank Inducing Norms Some good news Grussler, Giselsson, Rantzer Problem & Motivation • Zero-duality gap for bilinear regression Low-Rank Inducing Norms � Y − L T XR T � 2 minimize ℓ 2 X ∈ R k × k subject to rank ( X ) ≤ r • Optimality interpretations, e.g., iterative re-weighting min [ k ( � WX � g ) + h ( X )] X ∈ R m × n rank ( X ) ≤ r ≥ X ∈ R m × n [ k ( � WX � g,r ∗ ) + h ( X )] min

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend