Matrix Factorization with Binary Components Uniqueness in a - PowerPoint PPT Presentation

Matrix Factorization with Binary Components – Uniqueness in a randomized model Felix Krahmer, TU M¨ unchen Joint work with: Matthias Hein, Saarland University, David James , University of G¨ ottingen

Matrix Factorization � given data matrix D ∈ R m × n , n number of data points, m number of features � find matrices T ∈ R m × r , A ∈ R r × n such that min T ∈ R m × r , A ∈ R r × n � D − TA � 2 D = TA or F , exact case approximate case where r is typically small Globally optimal solution: � Singular Value Decomposition (SVD) D = U Σ V T T = U Σ , A = V T . = ⇒ � best rank r approximation obtained by taking top r singular values Problem: Factors often lack interpretation Felix Krahmer, TUM Matrix Factorization with Binary Components 2 of 23

Nonnegative Matrix Factorization (NMF) � given data matrix D ∈ R m × n , � find matrices T ∈ R m × r , A ∈ R r × n such that + + � D − TA � 2 D = TA or min T ∈ R m × r F . , A ∈ R r × n + + (taken from Lee, Seung: Learning the parts of objects by NMF, Nature(1999)) Felix Krahmer, TUM Matrix Factorization with Binary Components 3 of 23

Nonnegative Matrix Factorization (NMF) � given data matrix D ∈ R m × n , � find matrices T ∈ R m × r , A ∈ R r × n such that + + � D − TA � 2 D = TA or min T ∈ R m × r F . , A ∈ R r × n + + Prior work: � used for finding latent factors/components T � solved via alternating least squares but convergence can only proven to critical point = ⇒ no guarantee to find global optimum � In 2012 Arora, Ge, Kanna, Moitra propose an algorithm for exact NMF with runtime O (( nm ) r 2 ). � In the case where T is separable, algorithm runs in polynomial time (improved by Bittorf et al (2013)) Goal: extend conditions on NMF for which solution can be found efficiently Felix Krahmer, TUM Matrix Factorization with Binary Components 3 of 23

Gene expression data analysis � Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product.  � 0 0 1  �     1 1 0  1 1 0        � 1 0 1 �       =    0 1 0          0 1 1       � 1 1 0 �         1 1 1 0 0 0 gene product gene expression genes Goal: Decompose gene expression data into functional processes Felix Krahmer, TUM Matrix Factorization with Binary Components 4 of 23

Matrix Factorization with Binary Components Our model:   1 0 1 1 0 1 1 0 1 1 0 1   =   0 1 0 0   1 1 0 0     1 0 0 0 0 1 0 1 D ∈ R m × n T ∈ { 0 , 1 } m × r A ∈ R r × n Our Goal: factor D = TA Felix Krahmer, TUM Matrix Factorization with Binary Components 5 of 23

Matrix Factorization with Binary Components Our model:   1 0 1 1 0 1 1 0 1 1 0 1   =   0 1 0 0   1 1 0 0     1 0 0 0 0 1 0 1 D ∈ R m × n T ∈ { 0 , 1 } m × r A ∈ R r × n Our Goal: factor D = TA Assumptions: 1 T A = 1 T � rank ( D ) = r ≪ m , rank ( A ) = r , � the columns of T are affinely independent, i.e. ∀ λ ∈ R r with λ T 1 r = 0 and T λ = 0 = ⇒ λ = 0 Felix Krahmer, TUM Matrix Factorization with Binary Components 5 of 23

Key idea Lemma The affine hull of T and D agree, aff ( D ) = aff ( T ) . Illustration for m = 3 - note that aff ( D ) ∩ { 0 , 1 } m = T Theorem (Slawski, Hein, Lutsik (NIPS 2013)) Some exact factorization can be computed in O ( rm 2 r ) by computing aff ( T ) ∩ { 0 , 1 } m = aff ( D ) ∩ { 0 , 1 } m . Felix Krahmer, TUM Matrix Factorization with Binary Components 6 of 23

Uniqueness of the Factorization Solutions are not guaranteed to be nonnegative = > If two solutions exist, we may find one which is not nonnegative Uniqueness is crucial for the interpretability of the factors ! 1 0 1 1 1 0 0 0     0 1 1 0 1 1 0 0 ? 1 1 0 1 1 0 0 1  0 1 0 0   1 0 0 0  = =     1 1 0 0 1 0 1 1 A ′ A     1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 0 1 1 1 0 0 T ′ D T Factorization is unique if aff( T ) ∩ { 0 , 1 } m = { T : , 1 , . . . , T : , r } Felix Krahmer, TUM Matrix Factorization with Binary Components 7 of 23

Matrix Factorization with Random Binary Components Our model: t 1 , 1 t 1 , r   . . . t 2 , 1 t 2 , r . . .     = . .  . .  . .       t m − 1 , 1 t m − 1 , r . . . t m , 1 t m , r . . . D ∈ R m × n A ∈ R r × n , 1 T A = 1 T T random matrix � t ij are drawn independently from { 0 , 1 } with probabilities P [ t ij = 0] = p and P [ t ij = 1] = 1 − p , � choose p big to simulate sparse binary components � task: bound probability that aff( T ) ∩ { 0 , 1 } m � = { T : , 1 , . . . , T : , r } Felix Krahmer, TUM Matrix Factorization with Binary Components 8 of 23

Idea � Replace T with M taking the values in {− 1 , +1 } with same probability distribution P [aff( T ) ∩ { 0 , 1 } m � = { T : , 1 , . . . , T : , r } ] = P [aff( M ) ∩ {− 1 , +1 } m � = { M : , 1 , . . . , M : , r } ] � Define R s = P [ ∃ x ∈ R r , | supp ( x ) | = s , Mx ∈ {− 1 , +1 } m ] , Felix Krahmer, TUM Matrix Factorization with Binary Components 9 of 23

Idea � Replace T with M taking the values in {− 1 , +1 } with same probability distribution P [aff( T ) ∩ { 0 , 1 } m � = { T : , 1 , . . . , T : , r } ] = P [aff( M ) ∩ {− 1 , +1 } m � = { M : , 1 , . . . , M : , r } ] � Define R s = P [ ∃ x ∈ R r , | supp ( x ) | = s , Mx ∈ {− 1 , +1 } m ] , then r P [aff( M ) ∩ {− 1 , +1 } m � = { M : , 1 , . . . , M : , r } ] ≤ � R s s =2 Felix Krahmer, TUM Matrix Factorization with Binary Components 9 of 23

Idea � Replace T with M taking the values in {− 1 , +1 } with same probability distribution P [aff( T ) ∩ { 0 , 1 } m � = { T : , 1 , . . . , T : , r } ] = P [aff( M ) ∩ {− 1 , +1 } m � = { M : , 1 , . . . , M : , r } ] � Define R s = P [ ∃ x ∈ R r , | supp ( x ) | = s , Mx ∈ {− 1 , +1 } m ] , P s = P [ ∃ x ∈ R r , supp ( x ) = { 1 , . . . , s } : Mx ∈ {− 1 , +1 } m ] , then r r � r � P [aff( M ) ∩ {− 1 , +1 } m � = { M : , 1 , . . . , M : , r } ] ≤ � � R s ≤ P s s s =2 s =2 Felix Krahmer, TUM Matrix Factorization with Binary Components 9 of 23

Odlyzko 1988 Theorem (Odlyzko 1988) Let M be a random m × r matrix whose entries are drawn independently from {− 1 , +1 } with equal probabilities ( p = 1 / 2) . If � � 10 r ≤ m 1 − , log( m ) then �� 7 � � � m � r P [ aff ( M ) ∩ {− 1 , +1 } m � = { M : , 1 , . . . , M : , r } ] ≤ P 3 + O 3 10 � 3 � m , as m tends to infinity. with P 3 = 4 4 Felix Krahmer, TUM Matrix Factorization with Binary Components 10 of 23

Conjecture - Uniqueness under Random Sampling Conjecture Let M be a random m × r matrix whose entries are drawn independently from {− 1 , +1 } with probabilities P [ m ij = − 1] = p and P [ m ij = 1] = 1 − p , If there is some fixed ε > 0 such that r < m (1 − ε ) , Then, � � r P [ aff ( M ) ∩ {− 1 , +1 } m � = { M : , 1 , . . . , M : , r } ] ≤ P 3 + o ( P 3 ) 3 with P 3 = 4(1 − p (1 − p )) m , as m tends to infinity. Felix Krahmer, TUM Matrix Factorization with Binary Components 11 of 23

Conjecture - Uniqueness under Random Sampling Conjecture Let M be a random m × r matrix whose entries are drawn independently from {− 1 , +1 } with probabilities P [ m ij = − 1] = p and P [ m ij = 1] = 1 − p , If there is some fixed ε > 0 such that r < m (1 − ε ) , Then, � � r P [ aff ( M ) ∩ {− 1 , +1 } m � = { M : , 1 , . . . , M : , r } ] ≤ P 3 + o ( P 3 ) 3 with P 3 = 4(1 − p (1 − p )) m , as m tends to infinity. � (1 − p (1 − p )) < 1 for p ∈ (0 , 1) � (1 − 1 2 (1 − 1 2 )) = 3 4 Felix Krahmer, TUM Matrix Factorization with Binary Components 11 of 23

Partial result Theorem (almost/work in progress) Let M be a random m × r matrix whose entries are drawn independently from {− 1 , +1 } with probabilities P [ m ij = − 1] = p P [ m ij = 1] = 1 − p , and If there is some fixed ε > 0 such that r ≤ 32 , Then, � � r P [ aff ( M ) ∩ {− 1 , +1 } m � = { M : , 1 , . . . , M : , r } ] ≤ P 3 + o ( P 3 ) 3 with P 3 = 4(1 − p (1 − p )) m , as m tends to infinity. � (1 − p (1 − p )) < 1 for p ∈ (0 , 1) � (1 − 1 2 (1 − 1 2 )) = 3 4 Felix Krahmer, TUM Matrix Factorization with Binary Components 12 of 23

Sperner family and Sperners Lemma Definition (Sperner (1928)) A family of sets that does not include two sets X and Y for which X ⊂ Y is called a Sperner family . Felix Krahmer, TUM Matrix Factorization with Binary Components 13 of 23

Matrix Factorization with Binary Components Uniqueness in a - PowerPoint PPT Presentation

Matrix Factorization with Binary Components Uniqueness in a randomized model Felix Krahmer, TU M unchen Joint work with: Matthias Hein, Saarland University, David James , University of G ottingen Matrix Factorization given data

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

S-Matrix Uniqueness from Soft Theorems Laurentiu Rodina IPhT, CEA Saclay May 17, 2018 Based on

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Motivation Two important points Javier Estrada This issue is on very shaky ground IESE

UNDERSTANDING CONVERGENCE IN MPLS VPN NETWORKS Mukhtiar A. Shaikh (mshaikh@cisco.com) Moiz

de Sitter Vacua Bret Underwood McGill University Pheno 2009 S. Haque, G. Shiu, BU , T . Van

Scalable Verification of Stateful Networks Aurojit Panda, Ori Lahav, Katerina Argyraki, Mooly

New Results from Jefferson Lab (Hall C): Data and Fit Eric Christy (Thia Thia Keppel) Keppel)

EE 109 Unit 6 LCD Interfacing 6.2 LCD BOARD 6.3 The EE 109 LCD Shield The LCD shield is a

Introduction to hyperparameter tuning MODEL VALIDATION IN P YTH ON Kasey Jones Data Scientist

Instructions Interact With Each Other in Pipeline Structural Hazard: An instruction in the