On convergence of Approximate Message Passing Francesco Caltagirone - PowerPoint PPT Presentation

On convergence of Approximate Message Passing Francesco Caltagirone (1) , Florent Krzakala (2) and Lenka Zdeborova (1) (1) Institut de Physique Théorique, CEA Saclay (2) LPS, Ecole Normale Supérieure, Paris

Compressed Sensing the signal is an N components vector y = F x + ξ only K<N components are non-zero the measurement is an M<N components vector ρ = K α = M MxN random matrix with i.i.d. elements N N h ξ 2 i = ∆ white noise with variance = + ξ y F x GIVEN RECONSTRUCT

Standard techniques Minimization of the norm under linear constraint l 0 x || x || 0 with F x = y min || x || 0 = number of non-zero elements non-convex norm, exponentially hard to find Candès, Tao, Donoho Minimization of the norm l 1 N X || x || 1 = | x i | convex norm, easy to minimize i =1 The norm well approximates the norm l 1 l 0

The Donoho-Tanner line α = 1 Square matrix, we can invert it = + ξ y F x α < 1 Rectangular matrix, under-determined K = ρ N M = α N Information-theoretical limit 1 α = ρ 0.8 Let us consider the noiseless case with a measurement matrix with i.i.d. elements 0.6 distributed according to a gaussian with zero α mean and a variance of order 1/N. 0.4 α ( ρ 0 ) ` 1 α EM-BP ( ρ 0 ) s-BP, N=10 4 0.2 s-BP, N=10 3 Donoho-Tanner, L1 minimization α = ρ 0 AMP Bayesian 0 0 0.2 0.4 0.6 0.8 1 ρ 0 Information-Theoretical limit

Setting and motivation Bayesian setting Reconstruct the signal, given the measurement vector, GOAL the measurement matrix and a prior knowledge of the (sparse) distribution of signal elements Powerful algorithm. Approximate Message Passing Convergence issues. Donoho, Maleki, Montanari (2009)

Setting and motivation 1 F µi = γ N (0 , 1) y = F x + ξ N + P ( x ) = (1 − ρ ) δ ( x ) + ρ N (0 , 1) √ N Simplest case in which Approximate Massage Passing (AMP) has convergence problems. If the mean is sufficiently large then AMP displays violent divergencies. This kind of divergencies are observed in many other cases and are the main obstacle to a wide use of AMP . In this simple case there are workarounds that ensure convergence, like a “mean-removal” procedure. BUT it is interesting because want to understand the origin of the non- convergence that, we argue, is of the same nature in more complicated settings.

Bayesian Inference with Belief Propagation P ( x | F , y ) = P ( x | F ) P ( y | F , x ) Bayes formula P ( y | F ) M Conditional probability of 1 2 ∆ µ ( y µ − P N i =1 F µi x i ) 2 1 Y P ( y | F , x ) = e − the measurement vector p 2 π ∆ µ µ =1 N M 1 1 2 ∆ µ ( y µ − P N i =1 F µi x i ) 2 1 Y Y P ( x | F , y ) = [(1 − ρ ) δ ( x i ) + ρφ ( x i )] e − Z ( y , F ) p 2 π ∆ µ i =1 µ =1 MMSE estimator Z x ? i = d x i x i ν i ( x i ) N X ( x i − s i ) 2 /N E = i =1 Z ν i ( x i ) ≡ P ( x | F , y ) Takes an exponential time, unfeasible { x j } j 6 = i

Bayes optimal setting If we know exactly the prior distribution on the signal elements and on the noise we are in the so-called BAYES OPTIMAL setting In the following we will consider that this is the case. When it is not the case, the prior can be efficiently learned adding a step to the algorithm that I will present. (I will not talk about this)

Belief Propagation (Cavity method) x Two kinds of nodes: factors (matrix P ( x ) lines) and variables (signal elements) F m i → µ ( x i ) m µ → i We can introduce a third kind of nodes: the prior distribution on the signal elements, local field. Belief propagation works for: locally tree-like graphs or densely and weakly connected graphs. Messages represent an approximation to the marginal distribution of a variable. Messages are updated according to a sequential or parallel schedule until convergence (fixed point).

Belief Propagation, r-BP and AMP x BP P ( x ) F m i → µ O ( N 2 ) ( x i ) m µ → i continuous messages projection r-BP O ( N 2 ) numbers dense matrix, TAP For the last step one assumes parallel update AMP In this case, fast matrix multiplication algorithms O ( N 2 ) can be applied, reducing the complexity to operations N log( N ) Donoho, Maleki, Montanari (2009) Krzakala et al. (2012)

AMP Algorithm X V t +1 F 2 µi v t = (1) i , The performance of the algorithm µ i can be evaluated through i − ( y µ − ω t µ ) X X ω t +1 F 2 F µi a t µi v t = i , (2) µ ∆ + V t µ i i # − 1 "X F 2 ) 2 = µi ( Σ t +1 (3) , N E t = 1 i ∆ + V t +1 X ( s i − a t µ i ) 2 µ N ( y µ − ω t +1 ) P µ µ F µi i =1 ∆ + V t +1 R t +1 = a t µ i + (4) , i F 2 P µi N µ ∆ + V t +1 V t = 1 µ X v i a t +1 ( Σ t +1 ) 2 , R t +1 � � = f 1 (5) , i i i N v t +1 ( Σ t +1 ) 2 , R t +1 � � i =1 = f 2 . (6) i i i f k ( Σ 2 , R ) k-th connected cumulants w.r.t. the measure Z ( Σ 2 , R ) P ( x ) e − ( x − R )2 1 2 Σ 2 Q ( x ) = √ 2 π Σ 2 and are the AMP estimators for the mean and variance of the i-th signal component. a i v i

AMP Algorithm X V t +1 F 2 µi v t = (1) i , The performance of the algorithm µ i can be evaluated through i − ( y µ − ω t µ ) X X ω t +1 F 2 F µi a t µi v t = i , (2) µ ∆ + V t µ i i # − 1 "X F 2 ) 2 = µi ( Σ t +1 (3) , N E t = 1 i ∆ + V t +1 X ( s i − a t µ i ) 2 µ N ( y µ − ω t +1 ) P µ µ F µi i =1 ∆ + V t +1 R t +1 = a t µ i + (4) , i F 2 P µi N µ ∆ + V t +1 V t = 1 µ X v i a t +1 ( Σ t +1 ) 2 , R t +1 � � = f 1 (5) , i i i N v t +1 ( Σ t +1 ) 2 , R t +1 � � i =1 = f 2 . (6) i i i 1 F µi = γ N + N (0 , 1) √ N The AMP algorithm does NOT depend explicitly on the value of the mean of the matrix.

Convergence Bayes optimal case. Given a certain (sufficiently high) measurement ratio. Very small or zero noise.

Bayati, Montanari (rigorous in the zero-mean case) ‘11 State Evolution (infinite N) Krzakala et al. (replicas in the zero-mean case) ‘12 Caltagirone, Krzakala, Zdeborova (replicas in the non-zero-mean case) ‘14 State evolution is the asymptotic analysis of the average performance of the inference algorithm when the size of the signal goes to infinity. It gives a good indication of what happens in a practical situation if the size of the signal is sufficiently large. It can be obtained rigorously in simple cases and non rigorously with the replica method in more involved cases.

Bayati, Montanari (rigorous in the zero-mean case) ‘11 State Evolution (infinite N) Krzakala et al. (replicas in the zero-mean case) ‘12 Caltagirone, Krzakala, Zdeborova (replicas in the non-zero-mean case) ‘14 N N V t = 1 D t = 1 E t = 1 X X X ( s j − a t ( s i − a t i ) 2 j ) v i N N N i =1 j i =1 ✓ ∆ + V t ◆ Z Z V t +1 = , s + z A ( E t , D t ) + γ 2 D t d s P ( s ) D z × f 2 α ◆� 2  ✓ ∆ + V t Z Z E t +1 = , s + z A ( E t , D t ) + γ 2 D t d s P ( s ) D z × s − f 1 α ✓ ∆ + V t  ◆� Z Z D t +1 = , s + z A ( E t , D t ) + γ 2 D t d s P ( s ) D z × s − f 1 α E t + ∆ + γ 2 ( D t ) 2 r with A ( E t , D t ) = α If the mean is zero the density evolution that does not depend on D

The Nishimori Condition D t = 0 Bayes optimal setting E t = V t D t +1 = 0 E t +1 = V t +1 Therefore, analytically, if the evolution starts (exactly) on the Nishimori Line it stays on it until convergence. BUT What is the effect of small perturbations with respect to the NL? • Very small fluctuations due to numerical precision in the DE • Fluctuations due to finite size in the AMP algorithm

Zero-mean case Gaussian Signal, Gaussian inference, ρ =0.2, no spinodal 0.07 0.06 Convergence on the NL 0.05 (Bayati, Montanari) 0.04 E 0.03 0.02 0.01 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 V D The non-zero mean adds a third dimension to the phase space!

Stability Analysis (I) K = V − E D δ K E δ D K t +1 = f K ( V t , K t , D t ) D t +1 = f D ( V t , K t , D t ) V The NL is a “fixed line”: D ( K ∗ = 0 , D ∗ = 0)

Stability Analysis (II) We linearize the equations with ✓ δ K t +1 ✓ δ K t ◆ ◆ δ K t = K t − K ∗ = M · δ D t +1 δ D t δ D t = D t − D ∗ ✓ ∂ K f K ( V t , 0 , 0) ∂ D f K ( V t , 0 , 0) ◆ M = ∂ K f D ( V t , 0 , 0) ∂ D f D ( V t , 0 , 0)

Stability Analysis (II) We linearize the equations with ✓ δ K t +1 ✓ δ K t ◆ ◆ δ K t = K t − K ∗ = M · δ D t +1 δ D t δ D t = D t − D ∗ ✓ ∂ K f K ( V t , 0 , 0) ◆ 0 M = ∂ D f D ( V t , 0 , 0) 0 When the signal is Gauss-Bernoulli with zero mean, the off-diagonal terms vanish.

Stability Analysis (II) αγ 2 = − αγ 2 V t +1 Z Z λ D ∂ D f D ( V t ) = − A 2 , s + zA � � ds P ( s ) D z f 2 ∆ + V t , ∆ + V t ∂ K f K ( V t ) = − 1 1 Z Z A 2 , s + zA A 2 , s + zA ) 2 � � � � � ds P ( s ) + 2( f 2 D z f 4 2 ∆ + V t A 2 , s + zA A 2 , s + zA ⇥ � � ⇤ � � λ K +2 f 1 − s f 3 , ∆ = 10 − 10 ρ = 0 . 1 α = 0 . 3 0 • the eigenvalue is always less γ < γ (1) -0.5 c than 1 in modulus. • the eigenvalue -1 γ (1) < γ < γ (2) c c � D becomes larger than 1 in a limited -1.5 region. -2 • the eigenvalue is larger than 1 γ > γ (2) � =1.9 � =2.5 c -2.5 in modulus down to the fixed point. � =2.9 � =3.6 -3 -9 -8 -7 -6 -5 -4 -3 -2 -1 log 10 V

On convergence of Approximate Message Passing Francesco Caltagirone - PowerPoint PPT Presentation

On convergence of Approximate Message Passing Francesco Caltagirone (1) , Florent Krzakala (2) and Lenka Zdeborova (1) (1) Institut de Physique Thorique, CEA Saclay (2) LPS, Ecole Normale Suprieure, Paris Compressed Sensing the signal is an N

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

Message Passing Concepts Message Passing Model The message passing model is based on the

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Message-Passing Programming with MPI Message-Passing Concepts Overview This lecture will

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Interference Alignment via Message-Passing Message-Passing M. Guillaud Motivation Maxime

Distributed Objects Message Passing vs. Distributed Objects Message Passing versus Distributed

+ Design of Parallel Algorithms Introduction to the Message Passing Interface MPI + Principles

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

Lecture 5: Message Passing & Other Communication Mechanisms (SR & Java) Intro:

A little introduction to MPI Jean-Luc Falcone July 2017 Message Passing Basics Point to point

Message passing and channels INF4140 - Models of concurrency Message passing and channels Fall

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

c p e c Writing Message-Passing Parallel Programs with MPI Edinburgh Parallel Computing Centre

Electrical Systems 2 Basilio Bona DAUIN Politecnico di Torino Semester 1, 2014-15 B. Bona

Basic Elec. Engr Basic Elec. Engr. Lab . Lab ECS 204 ECS 204 Asst. Prof. Dr. Prapun Suksompong

for Scientific Computing Anargyros Papageorgiou Department of Computer Science Columbia

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Date: 18/02/2016 FN Time: 2 hours Full marks:

How to Attack the IoT with Hardware Trojans Janet Lackey under CC license CROSSING Conference

Why We Should Be Worried about Hardware Trojans Janet Lackey under CC license The Summer Research

Experiments with a double solenoid system: Measurements of the 6 He+p Resonant Scattering Rub

Sarawut Sujitjorn Synchrotron Light Research Institute Thailand on Prospective of Thailand

On convergence of Approximate Message Passing Francesco Caltagirone - PowerPoint PPT Presentation

On convergence of Approximate Message Passing Francesco Caltagirone (1) , Florent Krzakala (2) and Lenka Zdeborova (1) (1) Institut de Physique Thorique, CEA Saclay (2) LPS, Ecole Normale Suprieure, Paris Compressed Sensing the signal is an N

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

Message Passing Concepts Message Passing Model The message passing model is based on the

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Message-Passing Programming with MPI Message-Passing Concepts Overview This lecture will

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Interference Alignment via Message-Passing Message-Passing M. Guillaud Motivation Maxime

Distributed Objects Message Passing vs. Distributed Objects Message Passing versus Distributed

+ Design of Parallel Algorithms Introduction to the Message Passing Interface MPI + Principles

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

Lecture 5: Message Passing &amp; Other Communication Mechanisms (SR &amp; Java) Intro:

A little introduction to MPI Jean-Luc Falcone July 2017 Message Passing Basics Point to point

Message passing and channels INF4140 - Models of concurrency Message passing and channels Fall

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

c p e c Writing Message-Passing Parallel Programs with MPI Edinburgh Parallel Computing Centre

Electrical Systems 2 Basilio Bona DAUIN Politecnico di Torino Semester 1, 2014-15 B. Bona

Basic Elec. Engr Basic Elec. Engr. Lab . Lab ECS 204 ECS 204 Asst. Prof. Dr. Prapun Suksompong

for Scientific Computing Anargyros Papageorgiou Department of Computer Science Columbia

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Date: 18/02/2016 FN Time: 2 hours Full marks:

How to Attack the IoT with Hardware Trojans Janet Lackey under CC license CROSSING Conference

Why We Should Be Worried about Hardware Trojans Janet Lackey under CC license The Summer Research

Experiments with a double solenoid system: Measurements of the 6 He+p Resonant Scattering Rub

Sarawut Sujitjorn Synchrotron Light Research Institute Thailand on Prospective of Thailand

Lecture 5: Message Passing & Other Communication Mechanisms (SR & Java) Intro: