problem session 2
play

Problem Session 2 Problem 1: from belief propagation to Bayes AMP - PDF document

OOPS 2020 Mean field methods in high-dimensional statistics and nonconvex optimization Lecturer: Andrea Montanari Problem session leader: Michael Celentano July 6, 2020 Problem Session 2 Problem 1: from belief propagation to Bayes AMP state


  1. OOPS 2020 Mean field methods in high-dimensional statistics and nonconvex optimization Lecturer: Andrea Montanari Problem session leader: Michael Celentano July 6, 2020 Problem Session 2 Problem 1: from belief propagation to Bayes AMP state evolution Below I have depicted the computation tree. · · · v ′ · · · f ′ X f ′ v y f ′ X fv v f y f θ v · · · · · · We observe the edge weights X fv and for each factor node the outcome � X fv ′ θ v ′ + w f . y f = v ′ ∈ ∂f Recall iid iid iid ∼ N (0 , σ 2 ) . X fv ∼ N (0 , 1 /n ) , θ v ∼ µ Θ , and w f The belief propagation algorithm on the computation tree exactly computes the posterior p v ( ϑ |T v, 2 t ), where T v, 2 t is the σ -algebra generated by the obervations corresponding to nodes and edges within a 2 t -radius ball of v . The iteration is m 0 v → f ( ϑ ) = 1 ,   �  − 1 � 2 � m s � � m s � ˜ f → v ( ϑ ) ∝ exp y f − X fv ϑ − X fv ′ ϑ v ′ v ′ → f ( ϑ v ′ ) µ Θ (d ϑ v ′ ) ,  2 σ 2 v ′ ∈ ∂f \ v v ′ ∈ ∂f \ v v ′ ∈ ∂f \ v � m s +1 m s v → f ( ϑ ) ∝ ˜ f ′ → v ( ϑ ) , f ′ ∈ ∂v \ f � m t � m t with normalization ˜ f → v ( ϑ ) µ Θ | V ( v v , d ϑ ) = v → f ( ϑ ) µ Θ | V ( v v , d ϑ ) = 1. One can show that for any variable node v , the posterior density with respect to measure µ Θ is � m t − 1 p v ( ϑ |T v, 2 t ) ∝ ˜ f → v ( ϑ ) . f ∈ ∂v This equation is exact. Our goal is to show that when n, d → ∞ , n/d → δ � − 1 � v − ϑ ) 2 + o p (1) ( χ t p v ( ϑ |T v, 2 t ) ∝ exp , 2 τ 2 t 1

  2. d where ( χ t v , θ v ) → (Θ + τ t Z, Θ), Θ ∼ µ Θ , G ∼ N (0 , 1) independent of Θ, and τ t is given by the Bayes AMP state evolution equations t +1 = σ 2 + 1 τ 2 δ mmse Θ ( τ 2 t ) , initialized by τ 2 0 = ∞ . In fact, this follows without too much work once we show that � � − 1 v → f − ϑ ) 2 + o p (1) m s ( χ s v → f ( ϑ ) ∝ exp , (1) 2 τ 2 s d where ( χ s → (Θ + τ s Zµ s v → f , θ v ) v ′ → f − , Θ). This problem focuses on establishing (1). We do so inductively. The base case more-or-less follows the standard inductive step, except that we need to pay some attention to the infinite variance τ 2 0 = ∞ . We do not consider the base case here. Throughout, we assume µ Θ has compact support. We do not carefully verify the validity of all approximations. See Celentano, Montanari, Wu. “The estimation error of general first order methods.” COLT 2020 , for complete details. 2

  3. (a) Define � � v → f ) 2 = µ s ϑm s ( τ s ϑ 2 m s v → f ( ϑ ) µ Θ (d ϑ ) − ( µ s v ′ → f ) 2 , v → f = v → f ( ϑ ) µ Θ (d ϑ ) , and f → v ) 2 = � � µ s X fv ′ µ s τ s X 2 fv ′ ( τ s v ′ → f ) 2 . ˜ f → v = v ′ → f , (˜ v ′ ∈ ∂f \ v v ′ ∈ ∂f \ v Argue (non-rigorously) that we may approximate (up to normalization) m s µ s τ s � � ˜ f → v ( ϑ ) ≈ E G p ( X fv ϑ + ˜ f → v + ˜ f → v G − y f ) , 2 σ 2 x 2 is the normal density at variance σ 2 . 1 2 πσ e − 1 where G ∼ N (0 , 1) and p ( x ) = √ v → f ) 2 have a simple statistical interpretation: they are the pos- Remark: The quantities µ s v → f and ( τ s terior mean and variance for θ v given observations in the computation tree within distance 2 s of node v and excluding the branch in the direction of f . (b) Using the inductive hypothesis, show that as n, d → ∞ , n/d → δ → 1 p τ s f → v ) 2 δ mmse Θ ( τ 2 τ 2 (˜ s ) =: ˜ s . f → v = X fv θ v + ˜ µ s Z s Further, note y f − ˜ f → v , where ˜ � Z s X fv ′ ( θ v ′ − µ s f → v = w f + v ′ → f ) . v ′ ∈ ∂f \ v Argue � 0 , σ 2 + 1 � d Z s ˜ δ mmse Θ ( τ 2 → N s ) f → v and is independent of X fv and θ v . v ′ → f ( ϑ v ′ ) as v ′ varies in ∂f are iid and independent of the edge weights Hint: The (random) functions m s X fv ′ . Why? (c) For any smooth probability density f : R → R > 0 , µ ∈ R , and τ > 0, show that d τG )] = − 1 µ log E G [ f (˜ µ + ˜ τ E [ G | S + ˜ τG = ˜ µ ] , d˜ ˜ d 2 τG )] = − 1 µ 2 log E G [ f (˜ µ + ˜ τ 2 (1 − Var[ G | S + ˜ τG = ˜ µ ]) , d˜ ˜ where S ∼ f ( s )d s independent of G ∼ N (0 , 1). (d) We Taylor expand f → v ϑ − 1 f → v ϑ 2 + O p ( n − 3 / 2 ) . fv ˜ m s a s 2 X 2 b s log ˜ f → v ( ϑ ) ≈ const + X fv ˜ f → v and ˜ a s b s (We take this to be the definition of ˜ f → v ). Taking the approximation in part (a) to hold with equality, argue 1 1 ˜ a s µ s b s ˜ f → v = ( y f − ˜ f → v ) + o p (1) , f → v = + o p (1) . τ 2 τ 2 s +1 s +1 3

  4. µ s (e) Taking the approximations in part (d) to hold with equality and using part (b) to subsitute for y f − ˜ f → v , Taylor expand log m s +1 v → f ( ϑ ) to conclude 1 1 ϑ 2 + o p (1) , log m s +1 χ s +1 v → f ( ϑ ) = const + v → f ϑ − τ 2 τ 2 s +1 s +1 d where ( χ s +1 v → f , Θ) → (Θ+ τ s +1 Z, Θ). Why do we expect this Taylor expansion to be valid for all ϑ = O (1)? Conclude Eq. (1). 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend