Maximum Likelihood Matrix Completion Under Sparse Factor Models: - PowerPoint PPT Presentation

Maximum Likelihood Matrix Completion Under Sparse Factor Models: Error Guarantees and Efficient Algorithms Jarvis Haupt Department of Electrical and Computer Engineering University of Minnesota Institute for Computational and Experimental Research in Mathematics (ICERM) Workshop on Approximation, Integration, and Optimization October 1, 2014

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Section 1 Background and Motivation

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments A Classical Example Sampling Theorem: (Whittaker/Kotelnikov/Nyquist/Shannon, 1930’s-1950’s) Original Signal (Red) Samples (Black) Basic “Formula” for Inference: To draw inferences from limited data (or here, to impute missing elements), need to leverage underlying structure in the signal being inferred. Accurate Recovery (and Imputation) via Ideal Low-Pass Filtering when Original Signal is Bandlimited

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments A Contemporary Example Matrix Completion: (Candes & Recht; Keshavan, et al.; Candes & Tao; Candes & Plan; Negahban & Wainwright; Koltchinskii et al.; Davenport et al.;... 2009- ) Low-rank modeling assumption commonly utilized in collaborative filtering applications (e.g. the Netflix prize), Samples to describe settings where each observed value depends on only a few latent factors or features. Accurate Recovery (and Imputation) via Convex Optimization when Original Matrix is Low-Rank

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Beyond Low Rank Models? Low-Rank Models: All columns of the ma- Union of Subspaces Model: All columns of trix are well-approximated as vectors in the matrix are well-approximated as vectors common linear subspace. in a union of linear subspaces. Union of subspaces models are at the essence of sparse subspace clustering (Elhamifar & Vidal; Soltanolkotabi et al.; Erikkson et al; Balzano et al) and dictionary learning (Olshausen & Field; Aharon et al; Mairal et al.;...) . Here, we examine the efficacy of such models in matrix completion tasks.

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Section 2 Problem Statement

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments “Sparse Factor” Data Models We assume the unknown X ∗ ∈ R n 1 × n 2 we seek to estimate admits a factorization of the form X ∗ = D ∗ A ∗ , D ∗ ∈ R n 1 × r , A ∗ ∈ R r × n 2 where • � D ∗ � max � max i , j | D i , j | ≤ 1 (essentially to fix scaling ambiguities) • � A ∗ � max ≤ A max for a constant 0 < A max ≤ ( n 1 ∨ n 2 ) • � X ∗ � max ≤ X max / 2 for a constant X max ≥ 1

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments “Sparse Factor” Data Models We assume the unknown X ∗ ∈ R n 1 × n 2 we seek to estimate admits a factorization of the form X ∗ = D ∗ A ∗ , D ∗ ∈ R n 1 × r , A ∗ ∈ R r × n 2 where • � D ∗ � max � max i , j | D i , j | ≤ 1 (essentially to fix scaling ambiguities) • � A ∗ � max ≤ A max for a constant 0 < A max ≤ ( n 1 ∨ n 2 ) • � X ∗ � max ≤ X max / 2 for a constant X max ≥ 1 Our Focus: Sparse factor models , characterized by (approximately or exactly) sparse A ∗ .

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Observation Model We observe X ∗ only at a subset S ∈ { 1 , 2 , . . . , n 1 } × { 1 , 2 , . . . , n 2 } of its locations. For some γ ∈ (0 , 1] each ( i , j ) is in S independently with probability γ , and interpret γ = m ( n 1 n 2 ) − 1 , so that m = is the nominal number of observations. Observations { Y i , j } ( i , j ) ∈S � Y S conditionally independent given S , modeled via joint density � p X ∗ S ( Y S ) = p X ∗ i , j ( Y i , j ) � �� ( i , j ) ∈S scalar densities

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Estimation Approach We estimate X ∗ via a sparsity-penalized maximum likelihood approach: for λ > 0, we take � � � X = arg min − log p X S ( Y S ) + λ · � A � 0 . X = DA ∈X

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Estimation Approach We estimate X ∗ via a sparsity-penalized maximum likelihood approach: for λ > 0, we take � � � X = arg min − log p X S ( Y S ) + λ · � A � 0 . X = DA ∈X The set X of candidate reconstructions is any subset of X ′ , where X ′ � { X = DA : D ∈ D , A ∈ A , � X � max ≤ X max } , where • D : the set of all matrices D ∈ R n 1 × r whose elements are discretized to one of L uniformly-spaced values in the range [ − 1 , 1] • A : the set of all matrices A ∈ R r × n 2 whose elements either take the value zero, or are discretized to one of L uniformly-spaced values in the range [ − A max , A max ]

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Section 3 Error Bounds

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments A General “Sparse Factor” Matrix Completion Error Guarantee Theorem (A. Soni, S. Jain, J.H., and S. Gonella, 2014) Let β > 0 and set L = ( n 1 ∨ n 2 ) β . If C D satisfies C D ≥ max X ∈X max i , j D ( p X ∗ i , j � p X i , j ), then for � � 1 + 2 C D any λ ≥ 2 · ( β + 2) · · log( n 1 ∨ n 2 ), the sparsity penalized ML estimate 3 � � � X = arg min − log p X S ( Y S ) + λ · � A � 0 X = DA ∈X satisfies the (normalized, per-element) error bound � � − 2 log A ( p � X , p X ∗ ) E S , Y S ≤ 8 C D log m n 1 n 2 m � � n 1 p + � A � 0 � D ( p X ∗ � p X ) � �� λ + 4 C D ( β + 2) log( n 1 ∨ n 2 ) +3 min + . n 1 n 2 3 m X = DA ∈X Here: �� A ( p X , p X ∗ ) � � i , j ) � E pX ∗ i , j A ( p Xi , j , p X ∗ i , j ) where A ( p Xi , j , p X ∗ p Xi , j / p X ∗ is the Hellinger Affinity i , j i , j � � D ( p X ∗ � p X ) � � i , j � p Xi , j ) � E pX ∗ i , j D ( p X ∗ i , j � p Xi , j ) where D ( p X ∗ log( p X ∗ i , j / p Xi , j ) is KL Divergence i , j

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments A General “Sparse Factor” Matrix Completion Error Guarantee Theorem (A. Soni, S. Jain, J.H., and S. Gonella, 2014) Let β > 0 and set L = ( n 1 ∨ n 2 ) β . If C D satisfies C D ≥ max X ∈X max i , j D ( p X ∗ i , j � p X i , j ), then for � � 1 + 2 C D any λ ≥ 2 · ( β + 2) · · log( n 1 ∨ n 2 ), the sparsity penalized ML estimate 3 � � � X = arg min − log p X S ( Y S ) + λ · � A � 0 X = DA ∈X satisfies the (normalized, per-element) error bound � � − 2 log A ( p � X , p X ∗ ) E S , Y S ≤ 8 C D log m n 1 n 2 m � � n 1 p + � A � 0 � D ( p X ∗ � p X ) � �� λ + 4 C D ( β + 2) log( n 1 ∨ n 2 ) +3 min + . n 1 n 2 3 m X = DA ∈X Here: �� A ( p X , p X ∗ ) � � i , j ) � E pX ∗ i , j A ( p Xi , j , p X ∗ i , j ) where A ( p Xi , j , p X ∗ p Xi , j / p X ∗ is the Hellinger Affinity i , j i , j � � D ( p X ∗ � p X ) � � i , j � p Xi , j ) � E pX ∗ i , j D ( p X ∗ i , j � p Xi , j ) where D ( p X ∗ log( p X ∗ i , j / p Xi , j ) is KL Divergence i , j Next, we instantiate this result for some specific cases (using a specific choice of β, λ ).

Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Additive White Gaussian Noise Model Suppose each observation is corrupted by zero-mean AWGN with known variance σ 2 , so that   � 1  − 1 ( Y i , j − X ∗ i , j ) 2  . p X ∗ S ( Y S ) = (2 πσ 2 ) |S| / 2 exp 2 σ 2 ( i , j ) ∈S Let X = X ′ , essentially (a discretization of) a set of rank and max-norm constrained matrices. Gaussian Noise (Exact Sparse Factor Model) If A ∗ is exactly sparse with � A ∗ � 0 nonzero elements, the sparsity penalized ML estimate satisfies � � � X ∗ − � � n 1 r + � A ∗ � 0 X � 2 � � � E S , Y S F ( σ 2 + X 2 = O max ) log( n 1 ∨ n 2 ) . n 1 n 2 m

Maximum Likelihood Matrix Completion Under Sparse Factor Models: - PowerPoint PPT Presentation

Maximum Likelihood Matrix Completion Under Sparse Factor Models: Error Guarantees and Efficient Algorithms Jarvis Haupt Department of Electrical and Computer Engineering University of Minnesota Institute for Computational and Experimental

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Outline n Maximum likelihood (ML) n Priors, and maximum a posteriori (MAP) n

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum

The Price of Privacy In Untrusted Recommendation Engines Siddhartha Banerjee, Nidhi Hegde &

CMSC320 Introduction to Data One thing we might want to What data is available? What is the

Anonymization Beyond GDPR 1 WHO I AM Damien Clochard PostgreSQL DBA & Co-founder at Dalibo

Anon-Pass: Practical Anonymous Subscriptions Michael Z. Lee , Alan M. Dunn , Jonathan Katz

Machine Learning: Introduction and Probability Data Science School 2015 Dedan Kimathi

An Introduction to Machine Learning Shrey Gupta, Student at Duke University Who am I? Senior at

Recall 1 Wavelet coefficients of images are Laplacian distributed! The various wavelet

God on Trial LESSON 2 Your Response to the Lesson What was most interesting in the Bible story?

Maximum Likelihood Matrix Completion Under Sparse Factor Models: - PowerPoint PPT Presentation

Maximum Likelihood Matrix Completion Under Sparse Factor Models: Error Guarantees and Efficient Algorithms Jarvis Haupt Department of Electrical and Computer Engineering University of Minnesota Institute for Computational and Experimental

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Outline n Maximum likelihood (ML) n Priors, and maximum a posteriori (MAP) n

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum

The Price of Privacy In Untrusted Recommendation Engines Siddhartha Banerjee, Nidhi Hegde &amp;

CMSC320 Introduction to Data One thing we might want to What data is available? What is the

Anonymization Beyond GDPR 1 WHO I AM Damien Clochard PostgreSQL DBA &amp; Co-founder at Dalibo

Anon-Pass: Practical Anonymous Subscriptions Michael Z. Lee , Alan M. Dunn , Jonathan Katz

Machine Learning: Introduction and Probability Data Science School 2015 Dedan Kimathi

An Introduction to Machine Learning Shrey Gupta, Student at Duke University Who am I? Senior at

Recall 1 Wavelet coefficients of images are Laplacian distributed! The various wavelet

God on Trial LESSON 2 Your Response to the Lesson What was most interesting in the Bible story?

The Price of Privacy In Untrusted Recommendation Engines Siddhartha Banerjee, Nidhi Hegde &

Anonymization Beyond GDPR 1 WHO I AM Damien Clochard PostgreSQL DBA & Co-founder at Dalibo