a start of variational methods for ergm
play

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI - PowerPoint PPT Presentation

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI April 24, 2009 A start of Variational Methods for ERGM [1] Outline Introduction to ERGM Current methods of parameter estimation: MCMCMLE: Markov chain Monte-Carlo


  1. A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI April 24, 2009

  2. A start of Variational Methods for ERGM [1] Outline • Introduction to ERGM • Current methods of parameter estimation: – MCMCMLE: Markov chain Monte-Carlo estimation – MPLE: Maximum pseudo-likelihood estimation • Variational methods: – Exponential families and variational inference – Approximation of intractable families – Application on ERGM – Simulation study

  3. A start of Variational Methods for ERGM [2] Introduction to ERGM Network Notations • m actors; n = m ( m − 1) dyads 2 • Sociomatrix (adjacency matrix) Y : { y i,j } i,j =1 , ··· ,n • Edge set { ( i, j ) : y i,j = 1 } . • Undirected network: { y i,j = y j,i = 1 }

  4. A start of Variational Methods for ERGM [3] ERGM Exponential Family Random Graph Model (Frank and Strauss, 1986; Wasserman and Pattison, 1996; Handcock, Hunter, Butts, Goodreau and Morris, 2008): log[ P ( Y = y obs ; η )] = η T φ ( y obs ) − κ ( η, Y ) , y ∈ Y where • Y is the random matrix • η ∈ Ω ⊂ R q is the vector of model parameters • φ ( y ) is a q -vector of statistics • κ ( η, Y ) = log P z ∈Y exp { η T φ ( z ) } is the normalizing factor, which is difficult to calculate. • R package: statnet

  5. A start of Variational Methods for ERGM [4] Current estimation approaches for ERGM MCMC-MLE (Geyer and Thompson 1992, Snijders, 2002; Hunter, Handcock, Butts, Goodreau and Morris, 2008): 1. Set an initial value η 0 , for parameter η . 2. Generate MCMC samples of size m from P η 0 by Metropolis algorithm. 3. Iterate to obtain a maximizer ˜ η of the approximate log-likelihood ratio: h 1 ¯i m X ˘ ( η − η 0 ) T φ ( y obs ) − log ( η − η 0 ) T φ ( Y i ) exp m i =1 4. If the estimated variance of the approximate log-likelihood ratio is too large in comparison to the estimated log-likelihood for ˜ η , return to step 2 with η 0 = ˜ η . 5. Return ˜ η as MCMCMLE.

  6. A start of Variational Methods for ERGM [5] MPLE (Besag, 1975; Strauss and Ikeda, 1990): Conditional formulation: logit [ P ( Y ij = 1 | Y C ij = y C ij )] = η T δ ( y C ij ) . ij ) − φ ( y − where δ ( y C ij ) = φ ( y + ij ) , the change in φ ( y ) when y ij changes from 0 to 1 while the rest of network remains y C ij .

  7. A start of Variational Methods for ERGM [6] Comparison Simulation study: van Duijn, Gile and Handcock (2008) MCMC-MLE MPLE • Slow-mixing • Deterministic model; computation is fast • Highly depends on initial values • Unstable • Be able to model various network • Dyadic-independent model; characteristics together. could not capture high-order network characteristics.

  8. A start of Variational Methods for ERGM [7] Variational method Exponential families and variational representations Basics of exponential family: log[ p ( x ; θ )] = � θ, φ ( x ) � − κ ( θ ) . • Sufficient statistics: φ ( x ) . • Log-partition function: κ ( θ ) = log P x ∈X exp � θ, φ ( x ) � . • Mean value parametrization: µ ∈ R q := E ( φ ( x )) • Mean value space (convex hull): X ˘ ¯ µ ∈ R q | ∃ p ( · ) s.t. M = φ ( x ) p ( x ) = µ . X

  9. A start of Variational Methods for ERGM [8] The log-partition function is smooth and convex in terms of θ . Suppose θ = ( θ α , θ β , · · · ) and φ ( x ) = ( φ α ( x ) , φ β ( x ) , · · · ) : X ∂κ ( θ ) = E [ φ α ( x )] := φ α ( x ) p ( x ; θ ) . (1) ∂θ α x ∈X ∂κ ( θ ) = E [ φ α ( x ) φ β ( x )] − E [ φ α ( x )] E [ φ β ( x )] . (2) ∂θ α ∂θ β So, µ ( θ ) can be reexpressed as µ ( θ ) = ∂κ ∂θ ( θ ) and it has gradient ∂ 2 κ ∂θ T ∂θ ( θ ) . (Barndorff-Nielson, 1978; Handcock, 2003; Wainwright and Jordan, 2003)

  10. A start of Variational Methods for ERGM [9] Exp : Ising model on graph G ( V, E ) X X log p ( x, θ ) = { θ st x s x t − κ ( θ ) } , θ s x s + (3) s ∈ V ( s,t ) ∈ E where: • x s , associated with s ∈ V is a Bernoulli random variable; • components x s and x t are allowed to interact directly only if s and t are joined by an edge in the graph. The relevant mean parameters in this representation are as follows: µ s = E θ [ x s ] = p ( x s = 1; θ ) , µ st = E θ [ x s x t ] = p ( x s = 1 , x t = 1; θ ) . For each edge ( s, t ) , the triplet { µ s , µ t , µ st } uniquely determines a joint marginal p ( x s , x t ; µ ) as follows: » (1 + µ st − µ s − µ t ) – ( µ t − µ st ) p ( x s , x t ; µ ) = . ( µ s − µ st ) µ st

  11. A start of Variational Methods for ERGM [10] To ensure the joint marginal, we impose non-negativity constraints on all four entries, as follows: 1 + µ st − µ s − µ t ≥ 0 ≥ µ st 0 µ s ( /t ) − µ st ≥ 0 The inequalities above define M .

  12. A start of Variational Methods for ERGM [11] Variational inference and mean value estimation For any µ ∈ ri M (ri: relative interior), we have following lower bound: � θ, µ � − κ ∗ ( µ ) κ ( θ ) = sup (4) µ ∈M X exp {� θ, φ ( x ) �} κ ( θ ) = log p ( x ; θ ) p ( x ; θ ) x ∈X X ` exp {� θ, φ ( x ) �} ´ ≥ log p ( x ; θ ) p ( x ; θ ) x ∈X X X � θ, φ ( x ) � p ( x ; θ ) − = log( p ( x ; θ )) p ( x ; θ ) x ∈X x ∈X E � θ, φ ( x ) � − E [log( p ( x ; θ ))] = � θ, µ � − κ ∗ ( µ ) . = The inequality follows from Jensen’s inequality, and the last equality follows from E ( φ ( x )) = µ and κ ∗ ( µ ) = E [log( p ( x ; θ ( µ )))] , the negative entropy of distribution p ( x ; θ ) .

  13. A start of Variational Methods for ERGM [12] Why variational method? • Variational representation turns the problem of calculating intractable summation/integrals to optimization problem (finding lower bound of κ over M ). • The problem of computing mean parameters can be solved simultaneously. Two main difficulties: • The constraint set M of realizable mean parameters is difficult to characterize in an explicit manner. • κ ∗ ( µ ) is lack of explicit form and needs proper approximation.

  14. A start of Variational Methods for ERGM [13] Mean value estimation • µ is obtained by solving the optimization problem in (4). • However, the dual function κ ∗ lacks an explicit form in many cases. • We restrict the choice of µ to a tractable subset M t ( H ) of M ( G ) , where H is the tractable subgraph of G. The lower bound in (4) will then be computable. • The solution of the optimization problem {� µ, θ � − κ ∗ sup H ( µ ) } µ ∈M t ( H ) specifies optimal approximation ˜ µ t of µ . • The optimal ˜ µ t , in fact, minimizes the Kullback-Leibler divergence between the tractable M t and the target constraint M , and KL divergence between their natural parameter spaces as well.

  15. A start of Variational Methods for ERGM [14] Ising model on Graph: Approximation of κ ∗ Exp : Ising model on Graph: Approximation of κ ∗ Assume the tractable graph H 0 is fully disconnected, then the mean value parameter set is M 0 ( H 0 ) = { ( µ s , µ st ) | 0 ≤ µ s ≤ 1 , µ st = µ s µ t } Here, µ s = p ( x s = 1) and µ st = p ( x s = 1 , x t = 1) = µ s µ t . So, the distribution on H 0 is fully factorizable. Deriving from Bernoulli distribution, X κ ∗ H 0 ( µ ) = [ µ s log µ s + (1 − µ s ) log(1 − µ s )] . s ∈ V By (4), ˘ X X X ¯ θ st µ s µ t − [ µ s log µ s +(1 − µ s ) log(1 − µ s )] κ ( θ ) = max θ s µ s + . { µs }∈ [0 , 1] n s ∈ V s ∈ V ( s,t ) ∈ E (5)

  16. A start of Variational Methods for ERGM [15] After taking gradient and setting it to zero, we have following updates for µ : X logit ( µ s ) ← θ s + θ st µ t . (6) t ∈N ( s ) Apply (6) iteratively (coordinate ascent) to each node until convergence is reached.

  17. A start of Variational Methods for ERGM [16] Applications to ERGM Dependence Graph • G Y is a graph with m actors and n = m ( m − 1) dyads 2 • Construct a dependence graph D Y to describe the dependence structure of G Y : D Y = G ( V ( D ) , E ( D )) . – Each dyad ( i, j ) , i < j on G is an actor on D . – Each actor ( ij ) ∈ V ( D ) has a binary variable y ij . – Each edge on D exists if ( ij ) and ( kl ) as actors on D Y share a common value, i.e ( ij ) and ( kl ) as dyads on G Y share a node. • Frank and Strauss, 1986. Dependence Graph: D 12 Original Graph: G 3 23 24 1 13 14 2 4 34 Figure 1: Dependence Graph D

  18. A start of Variational Methods for ERGM [17] Exp : Erdos-Renyi Model: For an undirected random graph Y = { Y ij } , all dyads are mutually independent, so the dependency graph D is fully disconnected. Each y ij , ( ij ) ∈ D ( V ) is a Bernoulli random variable. The model can be written as X θ ij y ij − κ ( θ, Y ) , y ∈ Y . log[ P θ ( Y = y )] = i<j Calculating entropy of Bernoulli distribution, we have X κ ∗ ( µ ) = [ µ ij log( µ ij ) + (1 − µ ij ) log(1 − µ ij )] , (7) i<j where µ ij = P ( Y ij = 1) . Then, X {� θ, µ � − κ ∗ ( µ ) } = κ ( θ ) = sup log(1 + exp( θ ij )) , µ ∈M i<j µij when θ ij = log( 1 − µij ) .

  19. A start of Variational Methods for ERGM [18] 2-star ERGM model Analogous to Ising model, on dependence graph D = G ( V ( D ) , E ( D )) , X X θ st y s y t − κ ( θ ) , s : ( ij ) ∈ V ( G ) . log P ( Y, θ ) = θ s y s + s ∈ V ( D ) ( s,t ) ∈ E ( D ) If θ s = η 1 , s ∈ V and θ st = η 2 , ( s, t ) ∈ E , X X X log P ( Y, η ) = { η 1 y ij y ik − κ ( η ) } , y ij + η 2 i<j i j,k>i which corresponds to the canonical 2-star model.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend