STRUCTURED LOW-RANK MATRIX FACTORIZATION: GLOBAL OPTIMALITY, - PowerPoint PPT Presentation

1/20 STRUCTURED LOW-RANK MATRIX FACTORIZATION: GLOBAL OPTIMALITY, ALGORITHMS, AND APPLICATIONS ARTICLE BY BENJAMIN D. HAEFFELE AND RENÉ VIDAL (2017) CMAP Machine Learning Journal Club Speaker: Imke Mayer , December 13 th 2018 CMAP

2/20 OUTLINE Structured Matrix Factorization I. Context and definition i. Special case 1: Sparse dictionary learning (SDL) ii. Special case 2: Subspace clustering (SC) iii. Global optimality for structured matrix factorization II. Main theorem i. Polar problem ii. III. Application: SDL global optimality IV. Extension to tensor factorization and deep learning CMAP Machine Learning Journal Club, December 13 th 2018

3/20 STRUCTURED MATRIX FACTORIZATION CONTEXT (Large) high-dimensional datasets (images, videos, user ratings, etc.) Ø difficult to assess (computational issues, memory complexity) Ø but relevant information often lies in a low-dimensional structure Goal: recover this underlying low-dimensional structure of given (large scale) data X Motion segmentation Face clustering CMAP Machine Learning Journal Club, December 13 th 2018 [12] VIDAL, R., MA, Y., AND SASTRY, S. S. Generalized principal component analysis , vol. 5. Springer, 2016.

4/20 STRUCTURED MATRIX FACTORIZATION CONTEXT Large high-dimensional datasets (images, videos, user ratings, etc.) Ø difficult to assess (computational issues, memory complexity) Ø but relevant information often lies in general low-dimensional structure Goal: recover this underlying low-dimensional structure of given (large scale) data X Model assumption: linear subspace model. T he data can be approximated by one ore more low-dimensional subspace(s). X ⇡ UV T Basis of the linear low- Low-dimensional data dimensional structure representation CMAP Machine Learning Journal Club, December 13 th 2018

4/20 STRUCTURED MATRIX FACTORIZATION CONTEXT X ⇡ UV T Basis of the linear low- Low-dimensional data dimensional structure representation Ø Issue: Without any assumptions there are infinitely many choices for U and V such that X ≈ UV T . Ø Solution: Constrain the factors to satisfy certain properties. ` ( X, UV T ) + � Θ ( U, V ) min ( (1) U,V Ø Non-convex Ø Structured factors → more modeling flexibility Loss : Regularization : Ø Explicit representation measures the imposes restrictions approximation on the factors CMAP Machine Learning Journal Club, December 13 th 2018

5/20 STRUCTURED MATRIX FACTORIZATION SPECIAL CASE 1: SPARSE DICTIONARY LEARNING Given a set of signals, find a set of dictionary atoms and sparse codes to approximate the signals. [9] Ø denoising, inpainting Ø classification Sparse linear combinations = of dictionary atoms Denoised image Noisy image Dictionary atoms dictionary signals k X � UV T k 2 min F + � k V k 1 subject to k U i k 2  1 ( (3) U,V sparse codes CMAP Machine Learning Journal Club, December 13 th 2018 [9] OLSHAUSEN, B. A., AND FIELD, D. J. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research 37 , 23 (1997), 3311–3325.

6/20 STRUCTURED MATRIX FACTORIZATION SPECIAL CASE 1: SPARSE DICTIONARY LEARNING dictionary signals k X � UV T k 2 min F + � k V k 1 subject to k U i k 2  1 ( (3) U,V sparse codes Challenges: Ø Optimization strategies without global convergence guarantees Ø Which size for U and V? Need to pick r (number of columns) a priori r X k X � UV T k 2 min F + � � k V i k 1 + (1 � � ) k V i k 2 subject to k U i k 2  1 ( (4) U,V,r i =1 CMAP Machine Learning Journal Club, December 13 th 2018

7/20 STRUCTURED MATRIX FACTORIZATION SPECIAL CASE 2: SUBSPACE CLUSTERING Given data X coming from a union of subspaces, find these underlying subspaces and separate the data according to these subspaces. Ø clustering Ø recover low-dimensional structures CMAP Machine Learning Journal Club, December 13 th 2018

8/20 STRUCTURED MATRIX FACTORIZATION SPECIAL CASE 2: SUBSPACE CLUSTERING Given data X coming from a union of subspaces, determine these underlying subspaces and separate the data according to these subspaces. Ø clustering Ø recover low-dimensional structures Subspaces S 1 ,..., S n Segmentation by finding a subspace- characterized by bases → U preserving representation → V recover number and recover data dimensions of the segmentation subspaces Challenges: Ø Model selection: how many subspaces? Dimension of each subspace? Ø Potentially: difficult subspace configurations CMAP Machine Learning Journal Club, December 13 th 2018

9/20 STRUCTURED MATRIX FACTORIZATION SPECIAL CASE 2: SUBSPACE CLUSTERING One solution to do subspace clustering: Sparse Subspace Clustering [4] • Self-expressive dictionary: fix the dictionary as U ← X • Find a sparse representation over U which allows to segment the data. But optimality of the dictionary is not addressed. Idea: Sparse dictionary learning on union of subspaces model is suited to recover a more compact factorization with subspace-sparse codes. [1] [1] ADLER, A., ELAD, M., AND HEL-OR, Y. Linear-time subspace clustering via bipartite graph modeling. IEEE transactions on neural networks and learning systems 26 , 10 (2015), 2234–2246. [4] ELHAMIFAR, E., AND VIDAL, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence 35 , 11 (2013), 2765–2781.

10/20 STRUCTURED MATRIX FACTORIZATION THEORY FOR GLOBAL OPTIMALITY Matrix factorization Matrix approximation ? (1) ` ( X, UV T ) + � Θ ( U, V ) (2) min ` ( X, Y ) + � Ω Θ ( Y ) ( min ( Y U,V Ø Non-convex Ø Convex Ø Small problem size Ø Large problem size Ø Structured factors → more modeling flexibility Ø Unstructured Ø Explicit representation ` ( X, UV T ) min ` ( X, Y ) + � k Y k ⇤ min subject to U, V have number of columns  r U,V Y Low-rank matrix factorization Low-rank matrix approximation CMAP Machine Learning Journal Club, December 13 th 2018

10/20 STRUCTURED MATRIX FACTORIZATION THEORY FOR GLOBAL OPTIMALITY Matrix factorization Matrix approximation ? (1) ` ( X, UV T ) + � Θ ( U, V ) (2) min ` ( X, Y ) + � Ω Θ ( Y ) ( min ( Y U,V Ø Non-convex Ø Convex Ø Small problem size Ø Large problem size Ø Structured factors → more modeling flexibility Ø Unstructured Ø Explicit representation X Ideas: Find a convex relaxation for general a regularization function to couple the two problems (1) and (2) . • Ω Θ Θ • Allow the number of columns of U and V to change in (1) . Results: • Problem (2) gives a global lower-bound to problem (1) . This convex lower-bound allows to analyze global optimality for problem (1) . •

11/20 GLOBAL OPTIMALITY OF STRUCTURED MATRIX FACTORIZATION AT A LOCAL MINIMUM f ( U, V ) = ` ( X, UV T ) + � Θ ( U, V ) min ( U,V Assumptions: • Factorization size r is allowed to change. X • Loss is convex and once differentiable w.r.t. Y . ` ( X, Y ) + • is a sum of positively homogeneous functions of degree 2. Θ r X θ ( α u, α v ) = α 2 θ ( u, v ) Θ ( U, V ) = θ ( U i , V i ) for all α � 0 i =1 THEOREM [6] ( ˜ U, ˜ ( ˜ U i , ˜ Local minima of are globally optimal if for some . f ( U, V ) i 2 [ r ] V ) V i ) = (0 , 0) All local minima of of sufficient size are global minima. f ( U, V ) [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).

12/20 GLOBAL OPTIMALITY OF STRUCTURED MATRIX FACTORIZATION AT ANY POINT f ( U, V ) = ` ( X, UV T ) + � Θ ( U, V ) min ( U,V Assumptions: • Factorization size r is allowed to change. X • Loss is convex and once differentiable w.r.t. Y . ` ( X, Y ) + • is a sum of positively homogeneous functions of degree 2. Θ r X θ ( α u, α v ) = α 2 θ ( u, v ) Θ ( U, V ) = θ ( U i , V i ) for all α � 0 i =1 COROLLARY [6] ( ˜ U, ˜ A point is a global optimum of if it satisfies the following conditions: f ( U, V ) V ) ✓ ◆ � 1 ˜ � r Y ` ( X, ˜ U ˜ V i = ✓ ( ˜ ˜ U i , ˜ 1) U T V T ) V i ) for all i 2 [ r ] ← for many choices of " condition 1 is satisfied by first order optimal points i ✓ ◆ � 1 2) � r Y ` ( X, ˜ U ˜ u T V T ) v  ✓ ( u, v ) for all ( u, v ) [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).

12/20 GLOBAL OPTIMALITY OF STRUCTURED MATRIX FACTORIZATION AT ANY POINT f ( U, V ) = ` ( X, UV T ) + � Θ ( U, V ) min ( U,V Assumptions: • Factorization size r is allowed to change. X • Loss is convex and once differentiable w.r.t. Y . ` ( X, Y ) + • is a sum of positively homogeneous functions of degree 2. Θ r X θ ( α u, α v ) = α 2 θ ( u, v ) Θ ( U, V ) = θ ( U i , V i ) for all α � 0 i =1 COROLLARY [6] ( ˜ U, ˜ Given a point we can test whether it is a local minimum and of sufficient size V ) by testing: ✓ ◆ � 1 � r Y ` ( X, ˜ U ˜ (5) u T V T ) v  ✓ ( u, v ) for all ( u, v ) [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).

STRUCTURED LOW-RANK MATRIX FACTORIZATION: GLOBAL OPTIMALITY, - PowerPoint PPT Presentation

1/20 STRUCTURED LOW-RANK MATRIX FACTORIZATION: GLOBAL OPTIMALITY, ALGORITHMS, AND APPLICATIONS ARTICLE BY BENJAMIN D. HAEFFELE AND REN VIDAL (2017) CMAP Machine Learning Journal Club Speaker: Imke Mayer , December 13 th 2018 CMAP 2/20

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Matrix Factorization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

Structured sparse methods for matrix factorization Francis Bach Sierra team, INRIA - Ecole

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

Matrix invertibility Rank-Nullity Theorem: For any n -column matrix A , nullity A + rank A = n

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

A message-passing approach to low-rank matrix reconstruction and application to clustering

Higher derivative corrections from mixing color and kinematics Laurentiu Rodina IPhT, CEA

Replacing passwords with FIDO2 Nils Amiet June 29, 2020 Who am I? Nils Amiet

G 2 Tensor Product Splines over Extraordinary Vertices Charles Loop Scott Schaefer Microsoft

Components and Projections

CS 4495 Computer Vision Features 1 Harris and other corners Aaron Bobick School of

Maximum Flow The Maximum Flow Problem Given Directed graph G = ( V , E ) A capacity

Vainshtein in the UV Ippocratis Saltas Central European Institute for Cosmology & Fundamental

Introduction all QFTs that we use in physics are in some sense effective field theories, valid

STRUCTURED LOW-RANK MATRIX FACTORIZATION: GLOBAL OPTIMALITY, - PowerPoint PPT Presentation

1/20 STRUCTURED LOW-RANK MATRIX FACTORIZATION: GLOBAL OPTIMALITY, ALGORITHMS, AND APPLICATIONS ARTICLE BY BENJAMIN D. HAEFFELE AND REN VIDAL (2017) CMAP Machine Learning Journal Club Speaker: Imke Mayer , December 13 th 2018 CMAP 2/20

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Matrix Factorization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

Structured sparse methods for matrix factorization Francis Bach Sierra team, INRIA - Ecole

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

Matrix invertibility Rank-Nullity Theorem: For any n -column matrix A , nullity A + rank A = n

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

A message-passing approach to low-rank matrix reconstruction and application to clustering

Higher derivative corrections from mixing color and kinematics Laurentiu Rodina IPhT, CEA

Replacing passwords with FIDO2 Nils Amiet June 29, 2020 Who am I? Nils Amiet

G 2 Tensor Product Splines over Extraordinary Vertices Charles Loop Scott Schaefer Microsoft

Components and Projections

CS 4495 Computer Vision Features 1 Harris and other corners Aaron Bobick School of

Maximum Flow The Maximum Flow Problem Given Directed graph G = ( V , E ) A capacity

Vainshtein in the UV Ippocratis Saltas Central European Institute for Cosmology &amp; Fundamental

Introduction all QFTs that we use in physics are in some sense effective field theories, valid

Vainshtein in the UV Ippocratis Saltas Central European Institute for Cosmology & Fundamental