Low-Rank Inducing Norms with Optimality Interpretations LU - PowerPoint PPT Presentation

Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus Giselsson, Anders Rantzer June15 Automatic Control, Lund University

Low-Rank Inducing Norms Problem Grussler, Giselsson, Rantzer Problem & minimize k ( � X � ) + h ( X ) Motivation X ∈ R m × n Low-Rank subject to rank ( X ) ≤ r Inducing Norms 1 k : R ≥ 0 → R is an increasing, convex, proper, closed function 2 � · � is a unitarily invariant norm 3 h : R m × n → R is a closed, proper, convex function Vector-valued problems: minimize k ( � diag ( x ) � ) + h ( x ) x ∈ R n subject to rank ( diag ( x )) ≤ r � �� card ( x )

Low-Rank Inducing Norms Example: Bilinear Regression § Grussler, Giselsson, Rantzer Problem & Motivation Given Y ∈ R m × n , L ∈ R k × m , R ∈ R n × k , k ≤ min { m, n } Low-Rank Inducing Norms � Y − L T XR T � 2 minimize ℓ 2 X ∈ R k × k subject to rank ( X ) ≤ r where • X, Y ∈ R m × n : � X, Y � = trace ( X T Y ) . �� i σ 2 • � X � ℓ 2 = � X, X � = i ( X ) § I.S. Dhillon ’15

Low-Rank Inducing Norms By assumption rank ( L T XR T ) = rank ( X ) Grussler, � �� Giselsson, =: M Rantzer Problem & � M � 2 Motivation minimize − 2 � Y, M � + I { M = L T XR T : X ∈ R k × k } ( M ) ℓ 2 M Low-Rank � �� Inducing k ( � M � ) h ( M ) Norms subject to rank ( M ) ≤ r Applications: • Machine Learning: Principle Component Analysis, Multivariate Linear Regression, Data Compression, ... • Control: Model Reduction, System Identification, ...

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Explicit Solution: Problem & Motivation ℓ 2 = { L † Y r R † : Y r ∈ svd r ( Y ) } � Y − L T XR T � 2 argmin Low-Rank Inducing rank ( X ) ≤ r Norms � r � q � � σ i ( Y ) u i v T σ i ( Y ) u i v T svd r ( Y ) := i : Y = i is SVD of Y i =1 i =1 with σ 1 ( Y ) ≥ · · · ≥ σ q ( Y )

Low-Rank Inducing Norms Problem: Convex structural constraints? Grussler, Giselsson, Rantzer ℓ 2 + ˜ � Y − L T XR T � 2 minimize h ( X ) Problem & X Motivation subject to rank ( X ) ≤ r Low-Rank Inducing Norms Examples: • Nonnegative approximation: ˜ h ( X ) = I R k × k ≥ 0 ( X ) . • Hankel approximation: ˜ h ( X ) = I Hankel ( X ) . • Feasibility problems: Y = 0 and ˜ h ( X ) = I C ( X ) . Generally, no closed-form solutions are known!

Low-Rank Inducing Norms Nuclear Norm Regularization Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Standard approach today: Replace rank by nuclear norm § Inducing Norms minimize k ( � X � ) + h ( X ) X subject to � X � ℓ 1 ≤ λ • � X � ℓ 1 = � i σ i ( X ) • λ ≥ 0 is fixed. § Tibshirani, Chen, Donoho, Fazel, Boyd,...

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Pros: Problem & Motivation Low-Rank • Simple and generic heuristic = ⇒ No PhD needed! Inducing Norms • Probabilistic success guarantees § minimize rank ( X ) minimize � X � ℓ 1 X = ⇒ X subject to A ( X ) = y subject to A ( X ) = y § Cand` es, Tao, Recht, Fazel, Parrilo, Chandrasekaran, ...

Low-Rank Inducing Norms Baboon Approximation Grussler, Giselsson, Rantzer Problem & � Y − X � 2 Motivation minimize ℓ 2 + I R m × n ≥ 0 ( X ) X Low-Rank Inducing subject to rank ( X ) ≤ r Norms 0 . 3 � A − ( · ) � ℓ 2 0 . 25 � A � ℓ 2 0 . 2 0 . 15 0 . 1 0 . 05 1 20 40 60 80 rank

Low-Rank Inducing Norms Grussler, minimize k ( � X � ) + h ( X ) + λ � X � ℓ 1 Giselsson, X Rantzer � �� bias Problem & Motivation Cons: Low-Rank Inducing • Bias = Norms ⇒ May not solve the non-convex problem, e.g., Low-rank approximation • No a posteriori check if the non-convex problem is solved • Deterministic structure? • Requires to sweep over a regularization parameter ⇒ Cross-validation Goal of this talk: Fix it for our problem class!

Low-Rank Inducing Norms Modifications Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms Replace � · � ℓ 1 with � · � s § minimize k ( � X � ) + h ( X ) + λ � X � s . X � �� bias Problem: Nothing really changed! § Argyriou, Bach, Chandrasekaran, Eriksson, Mairal, Obozinski,...

Low-Rank Inducing Norms Convex Envelope Grussler, Giselsson, Rantzer Problem & Motivation f ∗∗ ( X ) min f ( X ) = min Low-Rank Inducing X X Norms f ∗∗ f ∗∗ ( X ) = ( f ∗ ) ∗ ( X ) f ( X ) ≥ f ∗∗ ( X ) X � ∗∗ unknown! � Problem: k ( � · � ) + I rank ( · ) ≤ r + h

Low-Rank Inducing Norms Old idea § Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Replace k ( � · � ) + I rank ( · ) ≤ r ( · ) with Norms � � ∗∗ k ( � · � ) + I rank ( · ) ≤ r Fact: � ∗∗ = k • � �� ∗∗ � k ( � · � ) + I rank ( · ) ≤ r � · � + I rank ( · ) ≤ r § Lemar´ i f ∗∗ echal 1973: min x � i f i ( x i ) → min x � i ( x i )

Low-Rank Inducing Norms Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms � X � g := g ( σ 1 ( X ) , . . . , σ min { m,n } ( X )) Example: � X � ℓ 2 − → g ( x ) = � x � ℓ 2 � X � ℓ 1 − → g ( x ) = � x � ℓ 1

Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Dual norm Problem & Motivation � X, Y � = g D ( σ 1 ( Y ) , . . . , σ min { m,n } ( Y )) � Y � g D := sup Low-Rank � X � g ≤ 1 Inducing Norms Examples: � Y � ℓ D 2 = � Y � ℓ 2 � Y � ℓ D 1 = � Y � ℓ ∞ = σ 1 ( Y )

Low-Rank Inducing Norms Grussler, Giselsson, Truncated dual norms Rantzer � X, Y � = g D ( σ 1 ( Y ) , . . . , σ r ( Y )) Problem & � Y � g D ,r := sup Motivation � �� X � g ≤ 1 = g D ( σ 1 ( Y ) ,...,σ r ( Y ) , 0 ..., 0) Low-Rank rank ( X ) ≤ r Inducing Norms Examples: � � r � � σ 2 � Y � ℓ D 2 ,r = i ( Y ) � i =1 � Y � ℓ D 1 ,r = � Y � ℓ ∞

Low-Rank Inducing Low-rank inducing norms § Norms Grussler, Giselsson, � X � g,r ∗ := sup � X, Y � . Rantzer � Y � gD,r ≤ 1 Problem & Motivation Low-Rank • If � · � g SDP representable = ⇒ � · � g,r ∗ SDP repres. Inducing Norms • If prox �·� g computable = ⇒ prox �·� g,r ∗ computable = ⇒ prox I �·� g,r ∗≤ t ( · , t ) computable = ⇒ k ( � · � g,r ∗ ) = min k ( t ) + I �·� g,r ∗ ≤ t ( · , t ) t Complexity for g = ℓ 2 , ℓ ∞ : SVD + O ( n log n ) ( n = # SVs) § Atomic norms, Overlapping norms, Support norms

Low-Rank Inducing Norms Geometric Interpretation Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank g,r ∗ := { X ∈ R m × n : � X � g,r ∗ ≤ 1 } B 1 Inducing Norms E g,r := { X ∈ R m × n : � X � g = 1 , rank ( X ) ≤ r } • B 1 g,r ∗ = conv ( E g,r ) B 1 • � X � g ≤ � X � g,r ∗ g,r ∗ • � X � g = � X � g,r ∗ , rank ( X ) ≤ r .

Low-Rank Inducing Norms Grussler, minimize � X � g minimize � X � g,r ∗ Giselsson, Rantzer X X ⇔ subject to A ( X ) = y, subject to A ( X ) = y, Problem & rank ( X ) ≤ r rank ( X ) ≤ r Motivation Low-Rank Inducing Norms A ( X ) = y

Low-Rank Inducing Norms Best Convex Relaxation Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank min [ k ( � X � g ) + h ( X )] ≥ X ∈ R m × n [ k ( � X � g,r ∗ ) + h ( X )] min Inducing X ∈ R m × n Norms rank ( X ) ≤ r Best in the sense: • ( k ( � · � g ) + I rank ( · ) ≤ r ( · ) + h ) ∗∗ unknown • Simple a posteriori test for optimality • Sweep over discrete r instead of λ = ⇒ Cross-validation ← → zero-duality gap Cost function replaced – NO BIAS!

Low-Rank Inducing Norms Nuclear Norm Grussler, Giselsson, Rantzer Standard interpretation: Problem & Motivation Low-Rank � · � ℓ 1 = ( rank ( · ) + I �·� ℓ ∞≤ 1 ) ∗∗ Inducing Norms Our interpretation # 1: � · � ℓ 1 = ( � · � ℓ 1 + I rank ( · ) ≤ r ) ∗∗ Our interpretation # 2: � X � ℓ 1 = � X � g, 1 ∗ ≥ · · · ≥ � X � g,r ∗ ≥ . . . ≥ � X � g,q ∗ = � X � g min [ k ( � X � g ) + h ( X )] ≥ X ∈ R m × n [ k ( � X � ℓ 1 ) + h ( X )] min X ∈ R m × n rank ( X ) ≤ 1

Low-Rank Inducing Norms Some good news Grussler, Giselsson, Rantzer Problem & Motivation • Zero-duality gap for bilinear regression Low-Rank Inducing Norms � Y − L T XR T � 2 minimize ℓ 2 X ∈ R k × k subject to rank ( X ) ≤ r • Optimality interpretations, e.g., iterative re-weighting min [ k ( � WX � g ) + h ( X )] X ∈ R m × n rank ( X ) ≤ r ≥ X ∈ R m × n [ k ( � WX � g,r ∗ ) + h ( X )] min

Low-Rank Inducing Norms with Optimality Interpretations LU - PowerPoint PPT Presentation

Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus Giselsson, Anders Rantzer June15 Automatic Control, Lund University Low-Rank Inducing Norms Problem Grussler, Giselsson, Rantzer Problem &

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Inducing Efficiently Inducing Efficiently optimizi optimizing outpati ng outpatient i ent

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 ,

Chapter 7 Norms and Distance Measures Chapter 7 Vector Norms Norms are functions which measure

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Enhancing compliance Why do people comply? Adriaan Denkers FIOD-ECD Why do people obey the law?

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Computational Interpretations of Differential Logic Jim Laird (University of Bath) May 30, 2013

On the status of astrophysical interpretations astrophysical interpretations On the status of of

STRUCTURED LOW-RANK MATRIX FACTORIZATION: GLOBAL OPTIMALITY, ALGORITHMS, AND APPLICATIONS ARTICLE

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Inducing Source Definitions for Web Service Composition Mark Carman Craig Knoblock Overview of

Lagrange Function and KKT Conditions October 26, 2018 265 / 429 How do you compute the table of

Non-Smooth Convex Optimization in Data Sciences Jalal Fadili Normandie Universit-ENSICAEN,

Probing correla4ons in A=3 systems using electron scaEering Reynier Cruz Torres Hall A/C

I see a cookie banner Is it even legal? Nataliia Bielova and Cristiana Santos joint work with

A Modular Approach to MaxSAT Modulo Theories Alessandro Cimatti 1 , Alberto Griggio 1 , Bastiaan

PROMETHEE-compatible presentations of multicriteria evaluation tables Karim Lidouh , Anh Vu Doan

GARCH models without positivity constraints: Exponential or Log GARCH ? C. Francq, O.

CENG4480 Lecture 07: PID Control Bei Yu byu@cse.cuhk.edu.hk (Latest update: October 10, 2018)

Sambuz

Useful Links

Newsletter

Mail Us