A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI - PowerPoint PPT Presentation

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI April 24, 2009

A start of Variational Methods for ERGM [1] Outline • Introduction to ERGM • Current methods of parameter estimation: – MCMCMLE: Markov chain Monte-Carlo estimation – MPLE: Maximum pseudo-likelihood estimation • Variational methods: – Exponential families and variational inference – Approximation of intractable families – Application on ERGM – Simulation study

A start of Variational Methods for ERGM [2] Introduction to ERGM Network Notations • m actors; n = m ( m − 1) dyads 2 • Sociomatrix (adjacency matrix) Y : { y i,j } i,j =1 , ··· ,n • Edge set { ( i, j ) : y i,j = 1 } . • Undirected network: { y i,j = y j,i = 1 }

A start of Variational Methods for ERGM [3] ERGM Exponential Family Random Graph Model (Frank and Strauss, 1986; Wasserman and Pattison, 1996; Handcock, Hunter, Butts, Goodreau and Morris, 2008): log[ P ( Y = y obs ; η )] = η T φ ( y obs ) − κ ( η, Y ) , y ∈ Y where • Y is the random matrix • η ∈ Ω ⊂ R q is the vector of model parameters • φ ( y ) is a q -vector of statistics • κ ( η, Y ) = log P z ∈Y exp { η T φ ( z ) } is the normalizing factor, which is difficult to calculate. • R package: statnet

A start of Variational Methods for ERGM [4] Current estimation approaches for ERGM MCMC-MLE (Geyer and Thompson 1992, Snijders, 2002; Hunter, Handcock, Butts, Goodreau and Morris, 2008): 1. Set an initial value η 0 , for parameter η . 2. Generate MCMC samples of size m from P η 0 by Metropolis algorithm. 3. Iterate to obtain a maximizer ˜ η of the approximate log-likelihood ratio: h 1 ¯i m X ˘ ( η − η 0 ) T φ ( y obs ) − log ( η − η 0 ) T φ ( Y i ) exp m i =1 4. If the estimated variance of the approximate log-likelihood ratio is too large in comparison to the estimated log-likelihood for ˜ η , return to step 2 with η 0 = ˜ η . 5. Return ˜ η as MCMCMLE.

A start of Variational Methods for ERGM [5] MPLE (Besag, 1975; Strauss and Ikeda, 1990): Conditional formulation: logit [ P ( Y ij = 1 | Y C ij = y C ij )] = η T δ ( y C ij ) . ij ) − φ ( y − where δ ( y C ij ) = φ ( y + ij ) , the change in φ ( y ) when y ij changes from 0 to 1 while the rest of network remains y C ij .

A start of Variational Methods for ERGM [6] Comparison Simulation study: van Duijn, Gile and Handcock (2008) MCMC-MLE MPLE • Slow-mixing • Deterministic model; computation is fast • Highly depends on initial values • Unstable • Be able to model various network • Dyadic-independent model; characteristics together. could not capture high-order network characteristics.

A start of Variational Methods for ERGM [7] Variational method Exponential families and variational representations Basics of exponential family: log[ p ( x ; θ )] = � θ, φ ( x ) � − κ ( θ ) . • Sufficient statistics: φ ( x ) . • Log-partition function: κ ( θ ) = log P x ∈X exp � θ, φ ( x ) � . • Mean value parametrization: µ ∈ R q := E ( φ ( x )) • Mean value space (convex hull): X ˘ ¯ µ ∈ R q | ∃ p ( · ) s.t. M = φ ( x ) p ( x ) = µ . X

A start of Variational Methods for ERGM [8] The log-partition function is smooth and convex in terms of θ . Suppose θ = ( θ α , θ β , · · · ) and φ ( x ) = ( φ α ( x ) , φ β ( x ) , · · · ) : X ∂κ ( θ ) = E [ φ α ( x )] := φ α ( x ) p ( x ; θ ) . (1) ∂θ α x ∈X ∂κ ( θ ) = E [ φ α ( x ) φ β ( x )] − E [ φ α ( x )] E [ φ β ( x )] . (2) ∂θ α ∂θ β So, µ ( θ ) can be reexpressed as µ ( θ ) = ∂κ ∂θ ( θ ) and it has gradient ∂ 2 κ ∂θ T ∂θ ( θ ) . (Barndorff-Nielson, 1978; Handcock, 2003; Wainwright and Jordan, 2003)

A start of Variational Methods for ERGM [9] Exp : Ising model on graph G ( V, E ) X X log p ( x, θ ) = { θ st x s x t − κ ( θ ) } , θ s x s + (3) s ∈ V ( s,t ) ∈ E where: • x s , associated with s ∈ V is a Bernoulli random variable; • components x s and x t are allowed to interact directly only if s and t are joined by an edge in the graph. The relevant mean parameters in this representation are as follows: µ s = E θ [ x s ] = p ( x s = 1; θ ) , µ st = E θ [ x s x t ] = p ( x s = 1 , x t = 1; θ ) . For each edge ( s, t ) , the triplet { µ s , µ t , µ st } uniquely determines a joint marginal p ( x s , x t ; µ ) as follows: » (1 + µ st − µ s − µ t ) – ( µ t − µ st ) p ( x s , x t ; µ ) = . ( µ s − µ st ) µ st

A start of Variational Methods for ERGM [10] To ensure the joint marginal, we impose non-negativity constraints on all four entries, as follows: 1 + µ st − µ s − µ t ≥ 0 ≥ µ st 0 µ s ( /t ) − µ st ≥ 0 The inequalities above define M .

A start of Variational Methods for ERGM [11] Variational inference and mean value estimation For any µ ∈ ri M (ri: relative interior), we have following lower bound: � θ, µ � − κ ∗ ( µ ) κ ( θ ) = sup (4) µ ∈M X exp {� θ, φ ( x ) �} κ ( θ ) = log p ( x ; θ ) p ( x ; θ ) x ∈X X ` exp {� θ, φ ( x ) �} ´ ≥ log p ( x ; θ ) p ( x ; θ ) x ∈X X X � θ, φ ( x ) � p ( x ; θ ) − = log( p ( x ; θ )) p ( x ; θ ) x ∈X x ∈X E � θ, φ ( x ) � − E [log( p ( x ; θ ))] = � θ, µ � − κ ∗ ( µ ) . = The inequality follows from Jensen’s inequality, and the last equality follows from E ( φ ( x )) = µ and κ ∗ ( µ ) = E [log( p ( x ; θ ( µ )))] , the negative entropy of distribution p ( x ; θ ) .

A start of Variational Methods for ERGM [12] Why variational method? • Variational representation turns the problem of calculating intractable summation/integrals to optimization problem (finding lower bound of κ over M ). • The problem of computing mean parameters can be solved simultaneously. Two main difficulties: • The constraint set M of realizable mean parameters is difficult to characterize in an explicit manner. • κ ∗ ( µ ) is lack of explicit form and needs proper approximation.

A start of Variational Methods for ERGM [13] Mean value estimation • µ is obtained by solving the optimization problem in (4). • However, the dual function κ ∗ lacks an explicit form in many cases. • We restrict the choice of µ to a tractable subset M t ( H ) of M ( G ) , where H is the tractable subgraph of G. The lower bound in (4) will then be computable. • The solution of the optimization problem {� µ, θ � − κ ∗ sup H ( µ ) } µ ∈M t ( H ) specifies optimal approximation ˜ µ t of µ . • The optimal ˜ µ t , in fact, minimizes the Kullback-Leibler divergence between the tractable M t and the target constraint M , and KL divergence between their natural parameter spaces as well.

A start of Variational Methods for ERGM [14] Ising model on Graph: Approximation of κ ∗ Exp : Ising model on Graph: Approximation of κ ∗ Assume the tractable graph H 0 is fully disconnected, then the mean value parameter set is M 0 ( H 0 ) = { ( µ s , µ st ) | 0 ≤ µ s ≤ 1 , µ st = µ s µ t } Here, µ s = p ( x s = 1) and µ st = p ( x s = 1 , x t = 1) = µ s µ t . So, the distribution on H 0 is fully factorizable. Deriving from Bernoulli distribution, X κ ∗ H 0 ( µ ) = [ µ s log µ s + (1 − µ s ) log(1 − µ s )] . s ∈ V By (4), ˘ X X X ¯ θ st µ s µ t − [ µ s log µ s +(1 − µ s ) log(1 − µ s )] κ ( θ ) = max θ s µ s + . { µs }∈ [0 , 1] n s ∈ V s ∈ V ( s,t ) ∈ E (5)

A start of Variational Methods for ERGM [15] After taking gradient and setting it to zero, we have following updates for µ : X logit ( µ s ) ← θ s + θ st µ t . (6) t ∈N ( s ) Apply (6) iteratively (coordinate ascent) to each node until convergence is reached.

A start of Variational Methods for ERGM [16] Applications to ERGM Dependence Graph • G Y is a graph with m actors and n = m ( m − 1) dyads 2 • Construct a dependence graph D Y to describe the dependence structure of G Y : D Y = G ( V ( D ) , E ( D )) . – Each dyad ( i, j ) , i < j on G is an actor on D . – Each actor ( ij ) ∈ V ( D ) has a binary variable y ij . – Each edge on D exists if ( ij ) and ( kl ) as actors on D Y share a common value, i.e ( ij ) and ( kl ) as dyads on G Y share a node. • Frank and Strauss, 1986. Dependence Graph: D 12 Original Graph: G 3 23 24 1 13 14 2 4 34 Figure 1: Dependence Graph D

A start of Variational Methods for ERGM [17] Exp : Erdos-Renyi Model: For an undirected random graph Y = { Y ij } , all dyads are mutually independent, so the dependency graph D is fully disconnected. Each y ij , ( ij ) ∈ D ( V ) is a Bernoulli random variable. The model can be written as X θ ij y ij − κ ( θ, Y ) , y ∈ Y . log[ P θ ( Y = y )] = i<j Calculating entropy of Bernoulli distribution, we have X κ ∗ ( µ ) = [ µ ij log( µ ij ) + (1 − µ ij ) log(1 − µ ij )] , (7) i<j where µ ij = P ( Y ij = 1) . Then, X {� θ, µ � − κ ∗ ( µ ) } = κ ( θ ) = sup log(1 + exp( θ ij )) , µ ∈M i<j µij when θ ij = log( 1 − µij ) .

A start of Variational Methods for ERGM [18] 2-star ERGM model Analogous to Ising model, on dependence graph D = G ( V ( D ) , E ( D )) , X X θ st y s y t − κ ( θ ) , s : ( ij ) ∈ V ( G ) . log P ( Y, θ ) = θ s y s + s ∈ V ( D ) ( s,t ) ∈ E ( D ) If θ s = η 1 , s ∈ V and θ st = η 2 , ( s, t ) ∈ E , X X X log P ( Y, η ) = { η 1 y ij y ik − κ ( η ) } , y ij + η 2 i<j i j,k>i which corresponds to the canonical 2-star model.

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI - PowerPoint PPT Presentation

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI April 24, 2009 A start of Variational Methods for ERGM [1] Outline Introduction to ERGM Current methods of parameter estimation: MCMCMLE: Markov chain Monte-Carlo

Some Bayesian Approaches for ERGM Ranran Wang, UW MURI-UCI August 25, 2009 Some Bayesian

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Variational methods for effective dynamics Robert L. Jerrard Department of Mathematics

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Constrained MCMC Algorithms for ERG models Duy Vu and David Hunter Constraints ergm uses

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models Mark S. Handcock

Global convergence rates of some multilevel methods for variational and quasi-variational

W|S|W Certified Public Accountants + Butler | Snow Start. Grow. Sell. 2 Start. Grow. Sell.

Snakes: Snakes: Snakes: Snakes: Active Contours Active Contours Active Contours Active

Variational methods for photometric 3D-reconstruction Yvain Q UAU CNRS, GREYC laboratory,

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

The Variational Predictive Natural Gradient Da Tang 1 Rajesh Ranganath 2 1 Columbia University 2

American-style options, stochastic volatility, and degenerate parabolic variational inequalities

Layouts in Drupal 8 @timplunkett Core Developer, Acquia Office of the CTO Drupal 7 Block

Simple geometrical models for the distribution of domain sizes in martensitic microstructures

Uniform boundedness of rational points Bjorn Poonen MIT CNTA XII, Lethbridge June 21, 2012

Modular Session Types for Objects Simon Gay, Nils Gesbert, Antnio Ravara, Vasco Vasconcelos

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Variational inference Probabilistic Graphical Models Sharif University of Technology Spring 2016

Weighted Sobolev spaces for the advection operator: A variational method for computing shape

variational methods for effective dynamics, part II Robert L. Jerrard Department of Mathematics

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI - PowerPoint PPT Presentation

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI April 24, 2009 A start of Variational Methods for ERGM [1] Outline Introduction to ERGM Current methods of parameter estimation: MCMCMLE: Markov chain Monte-Carlo

Some Bayesian Approaches for ERGM Ranran Wang, UW MURI-UCI August 25, 2009 Some Bayesian

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Variational methods for effective dynamics Robert L. Jerrard Department of Mathematics

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Constrained MCMC Algorithms for ERG models Duy Vu and David Hunter Constraints ergm uses

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models Mark S. Handcock

Global convergence rates of some multilevel methods for variational and quasi-variational

W|S|W Certified Public Accountants + Butler | Snow Start. Grow. Sell. 2 Start. Grow. Sell.

Snakes: Snakes: Snakes: Snakes: Active Contours Active Contours Active Contours Active

Variational methods for photometric 3D-reconstruction Yvain Q UAU CNRS, GREYC laboratory,

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

The Variational Predictive Natural Gradient Da Tang 1 Rajesh Ranganath 2 1 Columbia University 2

American-style options, stochastic volatility, and degenerate parabolic variational inequalities

Layouts in Drupal 8 @timplunkett Core Developer, Acquia Office of the CTO Drupal 7 Block

Simple geometrical models for the distribution of domain sizes in martensitic microstructures

Uniform boundedness of rational points Bjorn Poonen MIT CNTA XII, Lethbridge June 21, 2012

Modular Session Types for Objects Simon Gay, Nils Gesbert, Antnio Ravara, Vasco Vasconcelos

Probabilistic &amp; Unsupervised Learning Factored Variational Approximations and Variational

Variational inference Probabilistic Graphical Models Sharif University of Technology Spring 2016

Weighted Sobolev spaces for the advection operator: A variational method for computing shape

variational methods for effective dynamics, part II Robert L. Jerrard Department of Mathematics

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational