Low-rank Matrix Estimation via Approximate Message Passing Andrea - PowerPoint PPT Presentation

Low-rank Matrix Estimation via Approximate Message Passing Andrea Montanari Ramji Venkataramanan Stanford University University of Cambridge WoLA 2018 1 / 25

The Spiked Model k � λ i v i v T ∈ R n × n A = i + W i =1 • λ 1 ≥ λ 2 ≥ . . . ≥ λ k are deterministic scalars • v 1 , . . . , v k ∈ R n are orthonormal vectors • W ∼ GOE( n ) ⇒ W symmetric with ( W ii ) i ≤ n ∼ i . i . d . N(0 , 2 n ) and ( W ij ) i < j ≤ n ∼ i . i . d . N(0 , 1 n ) GOAL: To estimate the vectors v 1 , . . . , v k from A 2 / 25

Spectrum of spiked matrix k � λ i v i v T A = i + W i =1 Random matrix theory and the ‘BBAP’ phase transition : • Bulk of eigenvalues of A in [ − 2 , 2] distributed according to Wigner’s semicircle • Outlier eigenvalues corresponding to | λ i | ’s greater than 1: z i → λ i + 1 > 2 λ i • Eigenvectors ϕ i corresponding to outliers z i satisfy � 1 − λ − 2 |� ϕ i , v i �| → i [Baik, Ben Arous, P´ ech´ e ’05], [Baik, Silverstein ’06], [Capitaine, Donati-Martin, F´ eral ’09], [Benaych-Georges and Nadakuditi ’11], . . . 3 / 25

Structural information k � λ i v i v T A = i + W i =1 When v i ’s are unstructured, e.g., drawn uniformly at random from the unit sphere, • Best estimator of v i is the i th eigenvector ϕ i � 1 − 1 • If | λ i | ≥ 1, then |� v i , ϕ i �| → λ 2 i 4 / 25

Structural information k � λ i v i v T A = i + W i =1 When v i ’s are unstructured, e.g., drawn uniformly at random from the unit sphere, • Best estimator of v i is the i th eigenvector ϕ i � 1 − 1 • If | λ i | ≥ 1, then |� v i , ϕ i �| → λ 2 i But we often have structural information about v i ’s • For example, v i ’s may be sparse, bounded, non-negative etc. • Relevant for many applications: sparse PCA, non-negative PCA, community detection under stochastic block model, . . . • Can improve on spectral methods 4 / 25

Prior on eigenvectors k � i + W ≡ V Λ V T + W λ i v i v T A = i =1 R n × k V = [ v 1 v 2 . . . v k ] If each row of V is ∼ i . i . d P V , then Bayes-optimal estimator (for squared error) is � V Bayes = E [ V | A ] • Generally not computable • Closed-form expressions for asymptotic Bayes error [Deshpande, Montanari ’14], [Barbier et al. ’16], [Lesieur et al. ’17], [Miolane, Lelarge ’16] . . . 5 / 25

Computable estimators � k i + W ≡ V Λ V T + W λ i v i v T A = i =1 • Convex relaxations generally do not achieve Bayes optimal error [Javanmard, Montanari, Ricci-Tersinghi ’16] • MCMC can approximate Bayes estimator, but can have very large mixing time and hard to analyze 6 / 25

Computable estimators � k i + W ≡ V Λ V T + W λ i v i v T A = i =1 • Convex relaxations generally do not achieve Bayes optimal error [Javanmard, Montanari, Ricci-Tersinghi ’16] • MCMC can approximate Bayes estimator, but can have very large mixing time and hard to analyze In this talk Approximate Message Passing (AMP) algorithm to estimate V 6 / 25

Rank one spiked model A = λ n vv T + W , E V 2 = 1 v ∼ i . i . d . P V , Power iteration for principal eigenvector: x t +1 = Ax t , with x 0 chosen at random 7 / 25

Rank one spiked model A = λ n vv T + W , E V 2 = 1 v ∼ i . i . d . P V , Power iteration for principal eigenvector: x t +1 = Ax t , with x 0 chosen at random AMP : n � b t = 1 x t +1 = A f t ( x t ) − b t f t − 1 ( x t − 1 ) , f ′ t ( x t i ) n i =1 • Non-linear function f t chosen based on structural info on v • Memory term ensures a nice distributional property for the iterates in high dimensions • Can be derived via approximation of belief propagation equations 7 / 25

State evolution n � with b t = 1 x t +1 = A f t ( x t ) − b t f t − 1 ( x t − 1 ) , f ′ t ( x t i ) n i =1 If we initialize with x 0 independent of A , then as n → ∞ : x t − → µ t v + σ t g • g ∼ i . i . d . N(0 , 1), independent of v ∼ i . i . d . P V [Bayati,Montanari ’11], [Rangan, Fletcher ’12], [Deshpande, Montanari ’14] 8 / 25

State evolution n � with b t = 1 x t +1 = A f t ( x t ) − b t f t − 1 ( x t − 1 ) , f ′ t ( x t i ) n i =1 If we initialize with x 0 independent of A , then as n → ∞ : x t − → µ t v + σ t g • g ∼ i . i . d . N(0 , 1), independent of v ∼ i . i . d . P V • Scalars µ t , σ 2 t recursively determined as σ 2 t +1 = E [ f t ( µ t V + σ t G ) 2 ] µ t +1 = λ E [ V f t ( µ t V + σ t G )] , • Initialize with µ 0 = 1 n | E � x 0 , v �| [Bayati,Montanari ’11], [Rangan, Fletcher ’12], [Deshpande, Montanari ’14] 8 / 25

Bayes-optimal AMP Assuming x t = µ t v + σ t g , choose f t ( y ) = E [ V | µ t V + σ t G = y ] State evolution becomes γ t +1 = λ 2 � � 1 − mmse( γ t ) with µ t = σ 2 t = γ t √ P V ∼ uniform { 1 , − 1 } , λ = 2 Initial value γ 0 ∝ 1 n | E � x 0 , v �| , what is lim t →∞ γ t ? 9 / 25

Fixed points of state evolution • If E � x 0 , v � = 0, then γ t = 0 is an (unstable) fixed point. • This is the case in problems where v has zero mean, as x 0 is independent of v 10 / 25

Spectral Initialization A = λ n vv T + W , λ > 1 • Compute ϕ 1 , the principal eigenvector of A • Run AMP with initialization x 0 = √ n ϕ 1 √ • γ 0 > 0 as 1 n | E � x 0 , v �| → 1 − λ − 2 11 / 25

AMP with spectral initialization A = λ n vv T + W x 0 = √ n ϕ 1 x t +1 = A f t ( x t ) − b t f t − 1 ( x t − 1 ) , Existing AMP analysis does not apply for initialization x 0 correlated with v 12 / 25

AMP analysis with spectral initialization A = λ n vv T + W Let ( ϕ 1 , z 1 ) are the principal eigenvector and eigenvalue of A Instead of A , we will analyze AMP on � λ � n vv T + ˜ ˜ A = z 1 ϕ 1 ϕ T 1 + P ⊥ P ⊥ W • P ⊥ = I − ϕ 1 ϕ T 1 ˜ • W ∼ GOE( n ) is independent of W 13 / 25

True vs conditional model A = λ n vv T + W � λ � n vv T + ˜ ˜ A = z 1 ϕ 1 ϕ T 1 + P ⊥ P ⊥ W Lemma � � 1 v ) 2 ≥ 1 − λ − 2 − ε | z 1 − ( λ + λ − 1 ) | ≤ ε, ( ϕ T For ( z 1 , ϕ 1 ) ∈ , we have � �� ˜ 1 � � � z 1 , ϕ 1 � z 1 , ϕ 1 c ( ε ) e − nc ( ε ) sup � P A ∈ · − P A ∈ · � TV ≤ ( z ˆ S , Φ ˆ S ) ∈E ε 14 / 25

AMP on conditional model � λ � n vv T + ˜ ˜ A = z 1 ϕ 1 ϕ T 1 + P ⊥ P ⊥ W AMP with ˜ A instead of A : x 0 = √ n ϕ 1 x t +1 = ˜ x t ; t ) − b t f (˜ x t − 1 ; t − 1) , ˜ A f (˜ ˜ Analyze using existing AMP analysis + results from random matrix theory 15 / 25

Model assumptions A = λ n vv T + W Let v = v ( n ) ∈ R n be a sequence such that the empirical distribution of entries of v ( n ) converges weakly to P V , 16 / 25

Model assumptions A = λ n vv T + W Let v = v ( n ) ∈ R n be a sequence such that the empirical distribution of entries of v ( n ) converges weakly to P V , Performance of any estimator ˆ v measured via loss function ψ : R × R → R : n � v ) = 1 ψ ( v , ˆ ψ ( v i , ˆ v i ) . n i =1 ψ assumed to be pseudo-Lipschitz : ∀ x , y ∈ R 2 | ψ ( x ) − ψ ( y ) | ≤ C � x − y � 2 (1 + � x � 2 + � y � 2 ) , 16 / 25

Result for rank one case A = λ n vv T + W Theorem: Let λ > 1. Consider the AMP x t +1 = A f t ( x t ) − b t f t − 1 ( x t − 1 ) • Assume f t : R → R is Lipschitz continuous • Initialize with x 0 = √ n ϕ 1 Then for any pseudo-Lipschitz loss function ψ and t ≥ 0, � n 1 ψ ( v i , x t lim i ) = E { ψ ( V , µ t V + σ t G ) } a.s. n →∞ n i =1 17 / 25

Result for rank one case A = λ n vv T + W Theorem: Let λ > 1. Consider the AMP x t +1 = A f t ( x t ) − b t f t − 1 ( x t − 1 ) • Assume f t : R → R is Lipschitz continuous • Initialize with x 0 = √ n ϕ 1 Then for any pseudo-Lipschitz loss function ψ and t ≥ 0, � n 1 ψ ( v i , x t lim i ) = E { ψ ( V , µ t V + σ t G ) } a.s. n →∞ n i =1 The state evolution parameters are recursively defined as σ 2 t +1 = E [ f t ( µ t V + σ t G ) 2 ] , µ t +1 = λ E [ V f t ( µ t V + σ t G )] , √ 17 / 25 1 − λ − 2 and σ = 1 /λ . with µ =

Bayes-optimal AMP A = λ n vv T + W x t +1 = A f t ( x t ) − b t f t − 1 ( x t − 1 ) • Bayes-optimal choice f t ( y ) = λ E ( V | γ t V + √ γ t G = y ) • State evolution: γ t +1 = λ 2 � � γ 0 = λ 2 − 1 1 − mmse( γ t ) , �� 2 � V − E ( V | √ γ V + G ) where mmse( γ ) = E • µ t = σ 2 t = γ t 18 / 25

Bayes-optimal AMP A = λ n vv T + W Let γ AMP ( λ ) be the smallest strictly positive solution of γ = λ 2 [1 − mmse( γ )] . (1) x t = f t ( x t ) achieves Then the AMP estimate ˆ 1 2 = 1 − γ AMP ( λ ) x t − s v � 2 t →∞ lim lim min n � ˆ λ 2 n →∞ s ∈{ +1 , − 1 } 19 / 25

Bayes-optimal AMP A = λ n vv T + W Let γ AMP ( λ ) be the smallest strictly positive solution of γ = λ 2 [1 − mmse( γ )] . (1) x t = f t ( x t ) achieves Then the AMP estimate ˆ � x t , v �| |� ˆ γ AMP ( λ ) Overlap : t →∞ lim lim = x t � 2 � v � 2 � ˆ λ n →∞ 19 / 25

Low-rank Matrix Estimation via Approximate Message Passing Andrea - PowerPoint PPT Presentation

Low-rank Matrix Estimation via Approximate Message Passing Andrea Montanari Ramji Venkataramanan Stanford University University of Cambridge WoLA 2018 1 / 25 The Spiked Model k i v i v T R n n A = i + W i =1 1 2

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

Message Passing Concepts Message Passing Model The message passing model is based on the

A message-passing approach to low-rank matrix reconstruction and application to clustering

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Interference Alignment via Message-Passing Message-Passing M. Guillaud Motivation Maxime

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Message-Passing Programming with MPI Message-Passing Concepts Overview This lecture will

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Distributed Objects Message Passing vs. Distributed Objects Message Passing versus Distributed

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Compressive Parameter Estimation via Approximate Message Passing Marco F. Duarte Joint work

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

+ Design of Parallel Algorithms Introduction to the Message Passing Interface MPI + Principles

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

AMP: Accelerated Mobile Pages Re-Imagined Karen Stevenson Director of Technology L U L L A B O

Cloud Gateways Suli Yang, Kiran Srinivasan, Kishore Udayashankar, Swetha Krishnan, Jingxin Feng,

Transistor Amplifiers Lecture notes: Sec. 6 Sedra & Smith (6 th Ed): Sec. 5.6, 5.8, 6.6 &

Operational amplifiers ENGR 40M lecture notes August 7, 2017 Chuan-Zheng Lee, Stanford

Rediscover Google AMP Learn to integrate AMP with your Drupal project Twin Cities Drupal Camp |

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, NYU Joint work with: Ahmed El

Cold Elec. in Milano: updates and plans Claudio Gotti Electronics WG Meeting 9th october 2019

Generalized Approximate Survey Propagation for Hig igh-dimensional Estimation Luca Saglietti Yue

Sambuz

Useful Links

Newsletter

Mail Us

Low-rank Matrix Estimation via Approximate Message Passing Andrea - PowerPoint PPT Presentation

Low-rank Matrix Estimation via Approximate Message Passing Andrea Montanari Ramji Venkataramanan Stanford University University of Cambridge WoLA 2018 1 / 25 The Spiked Model k i v i v T R n n A = i + W i =1 1 2

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

Message Passing Concepts Message Passing Model The message passing model is based on the

A message-passing approach to low-rank matrix reconstruction and application to clustering

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Interference Alignment via Message-Passing Message-Passing M. Guillaud Motivation Maxime

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Message-Passing Programming with MPI Message-Passing Concepts Overview This lecture will

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Distributed Objects Message Passing vs. Distributed Objects Message Passing versus Distributed

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Compressive Parameter Estimation via Approximate Message Passing Marco F. Duarte Joint work

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

+ Design of Parallel Algorithms Introduction to the Message Passing Interface MPI + Principles

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

AMP: Accelerated Mobile Pages Re-Imagined Karen Stevenson Director of Technology L U L L A B O

Cloud Gateways Suli Yang, Kiran Srinivasan, Kishore Udayashankar, Swetha Krishnan, Jingxin Feng,

Transistor Amplifiers Lecture notes: Sec. 6 Sedra &amp; Smith (6 th Ed): Sec. 5.6, 5.8, 6.6 &amp;

Operational amplifiers ENGR 40M lecture notes August 7, 2017 Chuan-Zheng Lee, Stanford

Rediscover Google AMP Learn to integrate AMP with your Drupal project Twin Cities Drupal Camp |

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, NYU Joint work with: Ahmed El

Cold Elec. in Milano: updates and plans Claudio Gotti Electronics WG Meeting 9th october 2019

Generalized Approximate Survey Propagation for Hig igh-dimensional Estimation Luca Saglietti Yue

Sambuz

Useful Links

Newsletter

Mail Us

Transistor Amplifiers Lecture notes: Sec. 6 Sedra & Smith (6 th Ed): Sec. 5.6, 5.8, 6.6 &