bilinear generalized approximate message passing big amp
play

Bilinear generalized approximate message passing (BiG-AMP) for High - PowerPoint PPT Presentation

Bilinear generalized approximate message passing (BiG-AMP) for High Dimensional Inference Phil Schniter Collaborators: Jason Parker @OSU, Jeremy Vila @OSU, and Volkan Cehver @EPFL With support from NSF CCF-1218754, NSF CCF-1018368, NSF


  1. Bilinear generalized approximate message passing (BiG-AMP) for High Dimensional Inference Phil Schniter Collaborators: Jason Parker @OSU, Jeremy Vila @OSU, and Volkan Cehver @EPFL With support from NSF CCF-1218754, NSF CCF-1018368, NSF IIP-0968910, and DARPA/ONR N66001-10-1-4090 Oct. 10, 2013

  2. BiG-AMP Motivation Four Important High Dimensional Inference Problems 1 Matrix Completion (MC): Recover low-rank matrix Z � � from noise-corrupted incomplete observations Y = P Ω Z + W . 2 Robust Principle Components Analysis (RPCA): Recover low-rank matrix Z and sparse matrix S from noise-corrupted observations Y = Z + S + W . 3 Dictionary Learning (DL): Recover (possibly overcomplete) dictionary A and sparse matrix X from noise-corrupted observations Y = AX + W . 4 Non-negative Matrix Factorization (NMF): Recover non-negative matrices A and X from noise-corrupted observations Y = AX + W . The following generalizations may also be of interest: RPCA, DL, or NMF with incomplete observations. RPCA or DL with structured sparsity. Any of the above with non-additive corruptions (e.g., one-bit or phaseless Y ). Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 2 / 31

  3. BiG-AMP Contributions Contributions We propose a novel unified approach to these matrix-recovery problems that leverages the recent framework of approximate message passing (AMP). While previous AMP algorithms have been proposed for the linear model: Infer x ∼ � n p x ( x n ) from y = Φ x + w with AWGN w and known Φ . [Donoho/Maleki/Montanari’10] or the generalized linear model: Infer x ∼ � n p x ( x n ) from y ∼ � m p y | z ( y m | z m ) with hidden z = Φ x and known Φ . [Rangan’10] our work tackles the generalized bilinear model: Infer A ∼ � m,n p a ( a mn ) and X ∼ � n,l p x ( x nl ) from Y ∼ � m,l p y | z ( y ml | z ml ) with hidden Z = AX . [Schniter/Cevher’11] In addition, we propose methods to select the rank of Z , to estimate the parameters of p a , p x , p y | z , and to handle non-separable priors on A , X , Y | Z . Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 3 / 31

  4. BiG-AMP Contributions Outline 1 Bilinear Generalized AMP (BiG-AMP) Background on AMP BiG-AMP heuristics Example configurations/applications 2 Practicalities Adaptive damping Parameter tuning Rank selection Non-separable priors 3 Numerical results: Matrix completion Robust PCA Dictionary learning Hyperspectral unmixing (via NMF) Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 4 / 31

  5. BiG-AMP Description Bilinear Generalized AMP (BiG-AMP) BiG-AMP is a Bayesian approach that uses approximate message passing (AMP) strategies to infer ( Z , A , X ) . Generalized Bilinear: Generalized Linear: p x x nl p y | z ( y ml |· ) a mk p a p x x 1 k p y | z ( y 1 |· ) p x x 2 n m p y | z ( y 2 |· ) p x x 3 p y | z ( y M |· ) x 4 p x l In AMP, beliefs are propagated on a loopy factor graph using approximations that exploit certain blessings of dimensionality: 1 Gaussian message approximation (motivated by central limit theorem), 2 Taylor-series approximation of message differences. Rigorous analyses of GAMP for CS (with large iid sub-Gaussian Φ ) reveal a state evolution whose fixed points are optimal when unique. [Javanmard/Montanari’12] Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 5 / 31

  6. BiG-AMP Heuristics p a p x a 1 1 ← 1 1 → 1 x 1 BiG-AMP sum-product heuristics a 2 x 2 1. Message from i th node of Z to j th node of X : . . . . . . p x 1 ← N a N x N z i | x j ≈ N via CLT! � � �� � �� � �� � � � � � p x n p a n � = j p x i → j ( x j ) ∝ p y | z y i n a n x n i ← n ( a n ) i ← n ( x n ) � { a n } N n =1 , { x n } n � = j � � � z i ( x j ) , ν z ≈ p y | z ( y i | z i ) N z i ; ˆ i ( x j ) ≈ N (exact for AWGN!) z i (A similar thing then happens with the messages from Z to A .) z i ( x j ) , ν z i ( x j ) , the means and variances of p x i ← n & p a To compute ˆ i ← n suffice, and thus we have Gaussian message passing! p y | z ( y 1 |· ) 2. Although Gaussian, we still have 4 MLN messages to compute (too many!). Exploiting similarity among p x 1 → 1 ( x 1 ) p y | z ( y 2 |· ) the messages { p x i ← j } M i =1 , we employ a Taylor-series approximation whose error vanishes as M → ∞ . . . . (Same for { p a i ← j } L i =1 with L → ∞ .) In the end, we p y | z ( y M |· ) x N only need to compute O ( ML ) messages! p x p x M ← N Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 6 / 31

  7. BiG-AMP Configurations Example Configurations 1 Matrix Completion (MC): Recover low-rank Z = AX from Y = P Ω ( Z + W ) . � N ( z ml , v w ) ( m, l ) ∈ Ω a ml ∼ N (0 , 1) , x nl ∼ N ( µ x , v x ) , and y ml | z ml ∼ ∈ Ω 1 1 0 ( m, l ) / 2 Robust PCA (RPCA): a) Recover low-rank Z = AX from Y = Z + E . a mn ∼ N (0 , 1) , x nl ∼ N ( µ x , v x ) , y ml | z ml ∼ GM 2 ( λ, z ml , v w + v s , z ml , v w ) b) Recover low-rank Z = AX and sparse S from Y = [ A I ][ X T S T ] T + W . a mn ∼ N (0 , 1) , x nl ∼ N ( µ x , v x ) , s ml ∼ BG ( λ, 0 , v s ) , y ml | z ml ∼ N ( z ml , v w ) 3 Dictionary Learning (DL): Recover dictionary A and sparse X from Y = AX + W . a mn ∼ N (0 , 1) , x nl ∼ BG ( λ, 0 , v x ) , and y ml | z ml ∼ N ( z ml , v w ) 4 Non-negative Matrix Factorization (NMF): Recover non-negative A and X (up to perm/scale) from Y = AX + W . a mn ∼ N + (0 , µ a ) , x nl ∼ N + (0 , µ x ) , and y ml | z ml ∼ N ( z ml , v w ) Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 7 / 31

  8. BiG-AMP Configurations Example Configurtions (cont.) 5 One-bit Matrix Completion (MC): Recover low-rank Z = AX from Y = P Ω (sgn( Z + W )) . � probit ( m, l ) ∈ Ω a ml ∼ N (0 , 1) , x nl ∼ N ( µ x , v x ) , and y ml | z ml ∼ 1 1 0 ( m, l ) / ∈ Ω . . . leveraging previous work on one-bit/classification GAMP [Ziniel/Schniter’13] 6 Phaseless Matrix Completion (MC): Recover low-rank Z = AX from Y = P Ω ( abs ( Z + W )) . a ml ∼ N (0 , 1) , x nl ∼ N ( µ x , v x ) , and � | y || z | � � − | y | 2 + | z | 2 � � exp I 0 ( m, l ) ∈ Ω p y ml | z ml ( y | z ) = v w v w ∈ Ω 1 1 0 ( m, l ) / . . . leveraging previous work on phase-retrieval GAMP [Schniter/Rangan’12] 7 and so on . . . Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 8 / 31

  9. Practicalities Adaptive Damping Adaptive Damping The heuristics used to derive GAMP hold in the large system limit: M, N, L → ∞ with fixed M/N , M/L . In practice, M, N, L are finite and the rank N is often very small! To prevent BiG-AMP from diverging, we damp the updates using an adjustable step-size parameter β ∈ (0 , 1] . Moreover, we adapt β by monitoring (an approximation to) the cost function minimized by BiG-AMP and adjusting β as needed to ensure decreasing cost, leveraging similar methods from GAMP [Rangan/Schniter/Riegler/Fletcher/Cevher’13] . � �� � � � � ˆ � Y � J ( t ) = D p x nl | Y ˆ · � p x nl ( · ) ← KL divergence between posterior & prior n,l � �� � � � � � Y � + D p a mn | Y ˆ · � p a mn ( · ) m,n � � � − E N ( z ml ;¯ log p y ml | z ml ( y ml | z ml ) . p ml ( t ); ν p ml ( t )) m,l Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 9 / 31

  10. Practicalities Parameter Tuning Parameter Tuning via EM We treat the parameters θ that determine the priors p x , p a , p y | z as deterministic unknowns and compute (approximate) ML estimates using expectation-maximization (EM), as done for GAMP in [Vila/Schniter’13] . Taking X , A , and Z to be the hidden variables, the EM recursion becomes k +1 = arg max � � k � ˆ � Y ; ˆ E log p X , A , Z , Y ( X , A , Z , Y ; θ ) θ θ � θ � � � � k � � Y ; ˆ = arg max E log p x nl ( x nl ; θ ) θ � θ n,l � � k � � � Y ; ˆ + E log p a mn ( a mn ; θ ) θ � m,n k �� � � � � Y ; ˆ log p y ml | z ml ( y ml | z ml ; θ ) + E θ � m,l For tractability, the θ -maximization is performed one variable at a time. Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 10 / 31

  11. Practicalities Rank Selection Rank Selection In practice, the rank of Z (i.e., # columns in A and rows in X ) is unknown. We propose two methods for rank selection: 1 Penalized log-likelihood maximization: ˆ 2 log p Y | Z ( Y | ˆ A N ˆ X N ; ˆ N = arg max θ N ) − η ( N ) , N =1 ,...,N where η ( N ) penalizes the effective number of parameters under rank N (e.g., BIC, AIC). Although ˆ A N , ˆ X N , ˆ θ N are ideally ML estimates under rank N , we use EM-BiG-AMP estimates. 2 Rank contraction (adapted from LMaFit [Wen/Ying/Zhang’12] ): Run EM-BiG-AMP at maximum rank N and then set ˆ N to the location of the largest gap between singular values, but only if the gap is sufficiently large. If not, run EM-BiG-AMP and check again. For matrix completion we advocate the first strategy (with the AICc rule), while for robust PCA we advocate the second strategy. Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 11 / 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend