Bias-Adjusted Maximum Likelihood Estimation Improving Estimation for - PowerPoint PPT Presentation

Bias-Adjusted Maximum Likelihood Estimation Improving Estimation for Exponential-Family Random Graph Models (ERGMs) Ruth M Hummel David R Hunter Department of Statistics, Penn State University MURI meeting, May 25, 2010 MURI meeting May 2010 Estimation for ERGMs

Motivation: Why model networks? A statistical model for observed network data y obs allows us to: Summarize: Give a parsimonious quantitative summary of the data and, ideally, how precisely we know this summary Predict: Describe or simulate other networks that could have arisen from the same process MURI meeting May 2010 Estimation for ERGMs

Motivation: The likelihood function and MLE The ERG model class: P θ ( Y = y ) = exp { θ t g ( y ) } � exp { θ t g ( z ) } , where κ ( θ ) = κ ( θ ) all possible graphs z θ is a parameter vector to be estimated. g ( y ) is a user-defined vector of graph statistics. The loglikelihood function is ℓ ( θ ) = θ t g ( y obs ) − log κ ( θ ) . The MLE is the maximizer ˆ θ of the likelihood. MURI meeting May 2010 Estimation for ERGMs

The likelihood is sometimes intractable For this undirected, 34-node 9 network, computing ℓ ( θ ) directly 9 9 8 requires summation of 8 9 11 9 10 9 11 9 9 7,547,924,849,643,082,704,483, 9 9 11 109,161,976,537,781,833,842, 8 440,832,880,856,752,412,600, 8 491,248,324,784,297,704,172, 8 8 8 8 253,450,355,317,535,082,936, 7 750,061,527,689,799,541,169, 7 7 7 259,849,585,265,122,868,502, 7 7 7 7 865,392,087,298,790,653,952 7 7 7 terms. 7 MURI meeting May 2010 Estimation for ERGMs

The pseudolikelihood: A tractable alternative Some algebra based on the ERGM gives, for all i � = j , P ( Y ij = 1 | Y c ij ) ij ) = θ t � � g ( Y + ij ) − g ( Y − log ij ) . P ( Y ij = 0 | Y c The pseudolikelihood ignores the conditioning, assuming instead log P ( Y ij = 1) P ( Y ij = 0) = θ t � � g ( Y + ij ) − g ( Y − ≡ θ t δ ( Y ) ij ij ) independently for all i � = j . Thus, the pseudolikelihood equals � y obs � θ t δ ( y obs ) ij exp ij � 1 + exp { θ t δ ( y obs ) ij } i � = j MURI meeting May 2010 Estimation for ERGMs

Evidence of bias in MPLE compared to MLE Van Duijn, Gile, and Handcock (2009, Social Networks ) compare MLE to MPLE. They cite a small but compelling set of explorations of the MPLE, suggesting that there may be large differences between the MPLE and the approximate MLE, sometimes even in cases where the dependence is not thought to be a concern. They explore the bias in the MLE and MPLE compared to the “truth” They introduce a bias-corrected version of the MPLE (the “MBLE”). A similar bias-correction is possible for the MLE, though it is a bit less straightforward. MURI meeting May 2010 Estimation for ERGMs

bias-correction via Firth The bias-correction we employ (which might be better described as a preemptive bias- mitigation , rather than correction) follows from Firth (1993). The idea is to maximize a penalized likelihood which induces a bias in the score function in order to reverse the some of the anticipated bias in the maximizer. The penalized likelihood is: ℓ bc ( θ ) = ℓ ( θ ) + 1 / 2 log | I ( θ ) | The resulting maximizer is also the Bayesian maximum posterior estimator based on assigning a Jeffreys prior to the parameter. MURI meeting May 2010 Estimation for ERGMs

The intuition behind this modification for an exponential family model is the following: Since the score function, U ( η ), can be written U ( η ) = ℓ ′ ( η ) = g ( Y ) − κ ′ ( η ) , it is clear that the shape of U ( η ) is not affected by the sufficient statistic, g ( Y ). For this reason, any anticipated bias in the MLE can be offset by shifting the score function by the amount bias ∗ ∇ U . (Here ∇ U = − i ( η ).) This adjustment is illustrated in the following figure, taken from Firth (1993): Figure: Modification of the unbiased score function MURI meeting May 2010 Estimation for ERGMs

Evidence of bias in MLE (and MPLE) compared to “truth” Taken from van Duijn, et al. (2009), these boxplots show the bias of the MLE for selected parameters in two networks (“original” and “transitivity”) for the canonical parameter space. (The true parameter is shown as a horizontal line.) Note that the bias is greatest in the MLE. MURI meeting May 2010 Estimation for ERGMs

Evidence of bias in MLE (and MPLE) compared to “truth” Here we see that there is no bias of the MLE for selected parameters in two networks (“original” and “transitivity”) for the mean value parameter space. (This is by definition, since the mean-value MLE is the observed statistic.) MURI meeting May 2010 Estimation for ERGMs

Comparison on Lazega collaboration network In order to compare our present extended results to the results found for just the MBLE and the ordinary MPLE and MLE in the van Duijn, et al. paper, we duplicate their results on the corporate lawyer partnerships data and include the analysis for the bias-corrected MLE (pMLE). MURI meeting May 2010 Estimation for ERGMs

Lazega collaboration network The Lazega collaboration data are collaborations in the late 1980’s between 36 New England lawyers determined by their responses to the question “With which members of your firm have you spent time together on at least one case, have you been assigned to the same case, have they read or used your work product or have you have read or used their work product?” Additional member attributes collected include the attorneys’ gender , age , status (36 are partners; 35 are associates), seniority , years with the firm , practice (litigation or corporate), office location (Boston, Hartford, or Providence), and law school attended (Yale or Harvard, University of Connecticut, or any other). MURI meeting May 2010 Estimation for ERGMs

Following van Duijn, et al., we simulate networks based on a “truth” for the following model: ”True” parameter value Model terms edges -6.506 GWESP 0.897 seniority (nodal covariate) 0.853 practice (nodal covariate) 0.410 practice (homophily effect) 0.759 gender (homophily effect) 0.702 office (homophily effect) 1.145 MURI meeting May 2010 Estimation for ERGMs

Preliminary results: Results based on very few simulations show no improvement in the MLE yet... 0.60 1.1 0.55 1.0 0.50 0.9 0.45 0.8 0.40 0.7 0.35 0.6 MLE pMLE MPLE MBLE MLE pMLE MPLE MBLE Figure: Distribution of the GWESP and Nodal Practice canonical parameter; true parameter shown as horizontal line. MURI meeting May 2010 Estimation for ERGMs

Preliminary results: Here you can see that the number of sub-simulations for calculating the mean value parameter is clearly not sufficient, as the mean for the uncorrected MLE should be unbiased... 300 160 250 140 200 120 150 100 80 100 MLE pMLE MPLE MBLE MLE pMLE MPLE MBLE Figure: Distribution of the GWESP and Nodal Practice mean value parameter; true parameter shown as horizontal line. MURI meeting May 2010 Estimation for ERGMs

Current extensions: increasing the simulations for the current network applying the same to the “increased transitivity” version of the collaboration network as used in van Duijn, et al. applying the same to a larger biological network applying the same to a friendship network MURI meeting May 2010 Estimation for ERGMs

A few words about Contrastive Divergence (CD) Consider the idea of MCMC MLE: Suppose we fix η 0 . A bit of algebra shows that � � ( η − η 0 ) t g ( Y ) �� − log E η 0 exp = ℓ ( η ) − ℓ ( η 0 ) . (1) The Law of Large Numbers suggests obtaining a sample of Y from the model using θ 0 as the parameter, then approximating the expectation by a sample mean. Q: How do we sample from g ( Y ) using θ 0 as the parameter? A: Run MCMC infinitely long. MURI meeting May 2010 Estimation for ERGMs

A few words about Contrastive Divergence (CD) Consider the idea of MCMC MLE: Suppose we fix η 0 . A bit of algebra shows that � � ( η − η 0 ) t g ( Y ) �� − log E η 0 exp = ℓ ( η ) − ℓ ( η 0 ) . (1) The Law of Large Numbers suggests obtaining a sample of Y from the model using θ 0 as the parameter, then approximating the expectation by a sample mean. Q: How do we sample from g ( Y ) using θ 0 as the parameter? A: Run MCMC infinitely long. But what if we only run MCMC for a single step (starting at y obs ), for a randomly chosen Y ij ? For this Y ij , we’re sampling from the conditional distribution given ( y obs ) c ij . MURI meeting May 2010 Estimation for ERGMs

A few words about Contrastive Divergence (CD) To summarize: Running an infinitely long Markov chain leads to the loglikelihood. Running a 1-step Markov chain leads to the pseudolikelihood. Thus, if we alternately sample and then optimize the resulting ”likelihood-like” function, we can view MLE and MPLE as two ends of a spectrum, the “contrastive divergence” spectrum. (MLE is CD- ∞ and MPLE is CD-1.) MURI meeting May 2010 Estimation for ERGMs

Bias-Adjusted Maximum Likelihood Estimation Improving Estimation for - PowerPoint PPT Presentation

Bias-Adjusted Maximum Likelihood Estimation Improving Estimation for Exponential-Family Random Graph Models (ERGMs) Ruth M Hummel David R Hunter Department of Statistics, Penn State University MURI meeting, May 25, 2010 MURI meeting May 2010

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Quasi-maximum likelihood estimation for multivariate CARMA processes Eckhard Schlemm Institute

Bayesian Methods 1 Chris Williams School of Informatics, University of Edinburgh October 2015 1

Exponential Random Graph Models and Their Polytopes Johannes Rauh York University (the one in

Generalized Linear Model Certain nonlinear models with a specific structure arise from using

Probabilistic Graphical Models 10-708 More on learning fully observed More on learning fully

Hairs of a higher-dimensional analogue of the exponential family Patrick Comdhr

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

On The Information Geometry of Word Embedding Riccardo Volpi, joint work with D. Marinelli, P.

Automorphism Groups of Graphical Models and Lifted Variational Inference Hung Hai Bui 1 Tuyen N.

Bias-Adjusted Maximum Likelihood Estimation Improving Estimation for - PowerPoint PPT Presentation

Bias-Adjusted Maximum Likelihood Estimation Improving Estimation for Exponential-Family Random Graph Models (ERGMs) Ruth M Hummel David R Hunter Department of Statistics, Penn State University MURI meeting, May 25, 2010 MURI meeting May 2010

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Quasi-maximum likelihood estimation for multivariate CARMA processes Eckhard Schlemm Institute

Bayesian Methods 1 Chris Williams School of Informatics, University of Edinburgh October 2015 1

Exponential Random Graph Models and Their Polytopes Johannes Rauh York University (the one in

Generalized Linear Model Certain nonlinear models with a specific structure arise from using

Probabilistic Graphical Models 10-708 More on learning fully observed More on learning fully

Hairs of a higher-dimensional analogue of the exponential family Patrick Comdhr

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

On The Information Geometry of Word Embedding Riccardo Volpi, joint work with D. Marinelli, P.

Automorphism Groups of Graphical Models and Lifted Variational Inference Hung Hai Bui 1 Tuyen N.

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh