calibrating misspecified ergms for bayesian inference
play

Calibrating misspecified ERGMs for Bayesian inference Nial Friel - PowerPoint PPT Presentation

Calibrating misspecified ERGMs for Bayesian inference Nial Friel University College Dublin nial.friel@ucd.ie December, 2015 Joint with Lampros Bouranis, Florian Maire. Motivation There are many statistical models with intractable (or


  1. Calibrating misspecified ERGMs for Bayesian inference Nial Friel University College Dublin nial.friel@ucd.ie December, 2015 Joint with Lampros Bouranis, Florian Maire.

  2. Motivation ◮ There are many statistical models with intractable (or difficult to evaluate) likelihood functions. ◮ Composite likelihoods provide a generic approach to overcome this computational difficulty. ◮ A natural idea in a Bayesian context is to consider the approximate posterior distribution π cl ( θ | y ) ∝ f cl ( y | θ ) π ( θ ) . ◮ Surprisingly, there has been very little study of such a mis-specified posterior distribution.

  3. Motivation Algorithm Target Pseudoposterior Calibrated pseudoposterior 5 20 4 15 3 10 2 5 1 0 0 −5.5 −5.0 −4.5 −4.0 −0.5 −0.4 −0.3 −0.2 θ 1 (Edges) θ 2 (2−stars)

  4. Introduction ◮ We focus on the exponential random graph model – widely used in statistical network analysis. ◮ The pseudolikelihood function provides a low-dimensional approximation of the ERG likelihood. ◮ We provide a framework which allows one to calibrate the pseudo-posterior distribution. ◮ In experiments our approach provided improved statistical efficiency wrt to more computationally demanding Monte Carlo approaches.

  5. Exponential random graph model f ( y | θ ) = exp { θ T s ( y ) } = q θ ( y ) z ( θ ) . z ( θ ) ◮ y observed adjaceny matrix with n nodes where y ij = 1, if there an edge connecting nodes i and j ; otherwise, y ij = 0. ◮ s ( y ) ∈ R k is a known vector of sufficient statistics. ◮ θ ∈ R k is a vector of parameters. ◮ z ( θ ) is a normalizing constant. � exp { θ t s ( y ) } . z ( θ ) = all possible graphs ◮ 2( n 2 ) possible undirected graphs of n nodes. ◮ Calculation of z ( θ ) is infeasible for non-trivially graphs.

  6. Exponential random graph model f ( y | θ ) = exp { θ T s ( y ) } = q θ ( y ) z ( θ ) . z ( θ ) ◮ y observed adjaceny matrix with n nodes where y ij = 1, if there an edge connecting nodes i and j ; otherwise, y ij = 0. ◮ s ( y ) ∈ R k is a known vector of sufficient statistics. ◮ θ ∈ R k is a vector of parameters. ◮ z ( θ ) is a normalizing constant. � exp { θ t s ( y ) } . z ( θ ) = all possible graphs ◮ 2( n 2 ) possible undirected graphs of n nodes. ◮ Calculation of z ( θ ) is infeasible for non-trivially graphs.

  7. Model specification: Network statistics edge mutual edge 2-in-star 2-out-star 2-mixed-star transitive triad cyclic triad edge 2-star 3-star triangle

  8. Pseudolikelihood approximation (Besag, ’74), (Strauss and Ikeda, ’90) � f pl ( y | θ ) = p ( y ij | y − ij , θ ) i � = j p ( y ij = 1 | y − ij , θ ) y ij � = { 1 − p ( y ij = 1 | y − ij , θ ) } y ij − 1 , i � = j where y − ij denotes y \ y ij . � Each factor in the product is a Bernoulli random variable. � Estimation is equivalent to logistic regression. � Assumes the collection { y ij | y − ij } are mutually independent.

  9. Pseudolikelihood approximation (Besag, ’74), (Strauss and Ikeda, ’90) � f pl ( y | θ ) = p ( y ij | y − ij , θ ) i � = j p ( y ij = 1 | y − ij , θ ) y ij � = { 1 − p ( y ij = 1 | y − ij , θ ) } y ij − 1 , i � = j where y − ij denotes y \ y ij . � Each factor in the product is a Bernoulli random variable. � Estimation is equivalent to logistic regression. � Assumes the collection { y ij | y − ij } are mutually independent.

  10. Bayesian inference � q ( y | θ ) � 1 π ( θ | y ) = z ( θ ) p ( θ ) · π ( y ) ◮ Challenging to sample from the posterior distribution. ◮ π ( θ | y ) often called a doubly–intractable distribution . 1. Approximate Exchange algorithm (AEA) (Caimo and Friel, 2011). 2. Bottleneck: requires a sample from f ( y | θ ).

  11. Exchange algorithm (Murray et al, 2006) ◮ An auxiliary variable scheme to sample from the augmented distribution: π ( θ ′ , y ′ , θ | y ) ∝ f ( y | θ ) · π ( θ ) · h ( θ ′ | θ ) · f ( y ′ | θ ′ ) , (1) ◮ p ( y ′ y ′ y ′ | θ ′ ): the same distribution as the original distribution on which the data y is defined. ◮ h ( θ ′ | θ ) arbitrary distribution for the augmented variable θ ′ . ◮ Crucially, this require a draw from f ( y ′ | θ ′ ) at each iteration. Perfect sampling is not feasible for ERGMs. ◮ Pragmatic solution: Run M transitions of a Markov chain targetting f ( y | θ ′ ).

  12. Algorithm 1: Approximate Exchange algorithm (AEA) 1 Input: initial setting θ , number of iterations T.; 2 Output: A realization of length T from π ( θ | y ) ; 3 for t = 1 , . . . , T do Propose θ ′ ∼ h ( ·| θ ( t ) ); 4 Propose y ′ ∼ R M ( ·| θ ′ ) [”tie-no-tie” (TNT) sampler]; 5 Exchange move from ( θ ( t ) , y ) , ( θ ′ , y ′ ) to ( θ ′ , y ) , ( θ, y ′ ) with prob 6 � ❍❍❍❍ ✟ � ✟✟✟✟ 1 , q ( y ′ | θ ( t ) ) h ( θ ( t ) | θ ′ ) z ( θ ( t ) ) · z ( θ ′ ) p ( θ ′ ) q ( y | θ ′ ) α = min q ( y ′ | θ ′ ) × 7 q ( y | θ ( t ) ) p ( θ ( t ) ) h ( θ ′ | θ ( t ) ) z ( θ ′ ) · z ( θ ( t ) ) ❍ θ ( t +1) ← θ ′ ; 8 end The Bergm package in R implements the AEA (Caimo and Friel, 2014). (See Anto’s tutorial for more details).

  13. ◮ Intuitively one expects that the number of auxiliary iterations, M , is proportional to # of dyads of the graph, n 2 . ◮ This is supported by: Invariant distribution of approximate exchange converges to the true target as # of auxiliary iterations, M, increases (Everitt, 2012). Exponentially slow convergence of TNT sampling from an ERG model (Bhamidi et al., 2011). ◮ Conservative approach: choose a large M... ◮ Computationally intensive procedure for larger graphs due to exponentially long mixing time for auxiliary draw from the likelihood.

  14. Pseudo-posterior distribution ◮ Replace true likelihood f ( y | θ ) with a misspecified pseudolikelihood. π pl ( θ | y ) ∝ f pl ( y | θ ) · π ( θ ) ◮ Straightforward to sample from π pl ( θ | y ) using an MH sampler. Calibration approach 1. Mode adjustment of π pl ( θ | y ). 2. Curvature adjustment of π pl ( θ | y ).

  15. Calibration approach Notation: ◮ π : the target distribution. ◮ ν ( θ ) = π pl ( θ | y ) the misspecified target. ◮ ν 1 ( θ ) = π (1) pl ( θ | y ) the mean–adjusted target. ◮ ν 2 ( θ ) = π (2) pl ( θ | y ) the fully calibrated target after curvature adjustment. Θ π = θ ∗ , H π ( θ ) | θ ∗ = H ∗ , arg max Θ ν = ˆ θ PL = ˆ arg max θ PL , H ν ( θ ) | ˆ H PL . Objective Given a sample from ν , find a mapping φ : Θ → Θ, where the corrected samples φ ( θ ) = ( φ ( θ 1 ) , φ ( θ 2 ) , . . . ) satisfy: ν 2 = θ ∗ , H ν 2 ( θ ) | θ ∗ = H ∗ . arg max θ

  16. Calibration approach Notation: ◮ π : the target distribution. ◮ ν ( θ ) = π pl ( θ | y ) the misspecified target. ◮ ν 1 ( θ ) = π (1) pl ( θ | y ) the mean–adjusted target. ◮ ν 2 ( θ ) = π (2) pl ( θ | y ) the fully calibrated target after curvature adjustment. Θ π = θ ∗ , H π ( θ ) | θ ∗ = H ∗ , arg max Θ ν = ˆ θ PL = ˆ arg max θ PL , H ν ( θ ) | ˆ H PL . Objective Given a sample from ν , find a mapping φ : Θ → Θ, where the corrected samples φ ( θ ) = ( φ ( θ 1 ) , φ ( θ 2 ) , . . . ) satisfy: ν 2 = θ ∗ , H ν 2 ( θ ) | θ ∗ = H ∗ . arg max θ

  17. Calibration approach Notation: ◮ π : the target distribution. ◮ ν ( θ ) = π pl ( θ | y ) the misspecified target. ◮ ν 1 ( θ ) = π (1) pl ( θ | y ) the mean–adjusted target. ◮ ν 2 ( θ ) = π (2) pl ( θ | y ) the fully calibrated target after curvature adjustment. Θ π = θ ∗ , H π ( θ ) | θ ∗ = H ∗ , arg max Θ ν = ˆ θ PL = ˆ arg max θ PL , H ν ( θ ) | ˆ H PL . Objective Given a sample from ν , find a mapping φ : Θ → Θ, where the corrected samples φ ( θ ) = ( φ ( θ 1 ) , φ ( θ 2 ) , . . . ) satisfy: ν 2 = θ ∗ , H ν 2 ( θ ) | θ ∗ = H ∗ . arg max θ

  18. Our approach requires estimation of the MAP and Hessian of π ( θ | y ). Two key facts: 1. ∇ θ log π ( θ | y ) = s ( y ) − E y | θ [ s ( y )] + ∇ θ log π ( θ ) . ∇ 2 θ log π ( θ | y ) = V y | θ [ s ( y )] + ∇ 2 2. θ log π ( θ ) .

  19. Mean Adjustment (correct the mode of ν ) ◮ Need ν 1 ( θ ) = ν ( θ − τ 0 ), τ 0 = θ ∗ − ˆ θ pl , to admit θ ∗ as its mode. ◮ Mapping: φ 1 : θ → θ + θ ∗ − ˆ θ pl . ◮ Denote ξ = ( ξ 1 = φ 1 ( θ 1 ) , ξ 2 = φ 1 ( θ 2 ) , . . . ). ◮ θ ∗ (stochastic optimization): ∇ θ log f ( y | θ ) = E y | θ [ s ( y )]. ◮ ˆ θ pl (BFGS): standard logistic regression theory.

  20. Curvature Adjustment (Match H ν 1 ( θ ) | θ ∗ with H ∗ ) ◮ Obtain ν 2 using ν 2 ( θ ) = ν 1 ( W ( θ − θ ∗ ) + θ ∗ ), for some W ∈ M d ( R ) so that: H ν 2 ( θ ) | θ ∗ = W T H ν 1 ( θ ) | θ ∗ W = W T ˆ H PL W . ◮ Sufficient to choose W = M − 1 N where − H ∗ = N T N and − ˆ H PL = M T M . ◮ Samples ζ i = φ ( θ i ) = φ 2 ◦ φ 1 ( θ i ) obtained through φ : θ → V 0 ( θ + 2 θ ∗ − ˆ 0 V 0 ) − 1 = W T W . θ PL ) − θ ∗ , where ( V T See Ribatet et al. (2012) for a similar approach. Note: ◮ φ 1 and φ 2 are non–commutative operators. ◮ Samples ζ i = φ 2 ◦ φ 1 ( θ i ) � = ζ ′ i = φ 1 ◦ φ 2 ( θ i ).

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend