Estimation in Mixed Models with Dirichlet Process Random Effects - PowerPoint PPT Presentation

The Fourth Erich L. Lehmann Symposium May 9 - 12, 2011 Estimation in Mixed Models with Dirichlet Process Random Effects Both Sides of the Story George Casella Chen Li Department of Statistics Department of Statistics University of Florida University of Florida Minjung Kyung Jeff Gill Center for Applied Statistics Center for Applied Statistics Washington University Washington University Supported by NSF Grants: SES-0958982 & SES-0959054.

Estimation in Dirichlet Process Random Effects Models: Introduction [1] Introduction ◮ The Beginning Prior distributions in the social sciences ◮ Transition After the data analysis: model properties ◮ Dirichlet Process Likelihood, subclusters, precision parameter Random Effects ◮ MCMC Parameter expansion, convergence, optimality ◮ Example Scottish election, normal random effects ◮ Some Theory Why are the intervals shorter? ◮ Classical OLS, BLUE Mixed Models ◮ Conclusions And other remarks

Estimation in Dirichlet Process Random Effects Models: Introduction [2] ———But First——— Here is the Big Picture ◮ Usual Random Effects Model Y | ψ ∼ N ( Xβ + ψ, σ 2 I ) , ψ i ∼ N (0 , τ 2 ) ⊲ Subject-specific random effect ◮ Dirichlet Process Random Effects Model Y | ψ ∼ N ( Xβ + ψ, σ 2 I ) , ψ i ∼ DP ( m, N (0 , τ 2 )) ◮ Results in ⊲ Fewer Assumptions ⊲ Better Estimates ⊲ Shorter Credible Intervals ⊲ Straightforward Classical Estimation

Estimation in Dirichlet Process Random Effects Models: How this all started [3] How This All Started The Use of Prior Distributions in the Social Sciences ◮ When do priors matter in social science research? Can more flexible priors help us ◮ How to specify known prior information? recover latent ◮ Bayesian social scientists like uninformed priors hierarchical information? ◮ Reviewers often skeptical about informed priors ◮ Survey of Political Executives (Gill and Casella 2008 JASA) ⊲ Outcome Variable: stress ⊲ surrogate for self-perceived effectiveness and job-satisfaction ⊲ five-point scale from “not stressful at all” to “very stressful.” ⊲ Ordered probit model

Estimation in Dirichlet Process Random Effects Models: How this all started [4] Survey of Political Executives Some Coefficient Estimates Posterior Mean 95% HD Interval 0.120 [ –0.086 : 0.141] Government Experience 0.076 [ -0.031 : 0.087] Republican -0.181 [ -0.302 : -0.168] Committee Relationship -0.316 [ -0.598 : -0.286] Confirmation Preparation 0.447 [ 0.351 : 0.457] Hours/Week -0.338 [ -0.621 : -0.309] President Orientation [ -1.958 : ] Cutpoints: -1.488 -1.598 (None) (Little) [ -1.410 : ] -0.959 -1.078 (Little) (Some) [ -0.786 : ] -0.325 0.454 (Some) (Significant) ◮ Intervals are very tight [ ] 0.844 0.411 : 0.730 (Significant) (Extreme) ◮ Most do not overlap zero ◮ Seems typical of Dirichlet Process random effects model (later) ◮ Reasonable Subject Matter Interpretations

Estimation in Dirichlet Process Random Effects Models: Motivation [5] Transition What Did We Learn? ◮ Dirichlet Process Random Effects Models Analyzing Social Science Data ⊲ Accepted by Social Scientists ⊲ Computationally Feasible ⊲ Provides good estimates ◮ “Off the shelf ” MCMC ⊲ can we do better? ◮ Precision parameter m ⊲ arbitrarily fixed Understanding the Methodology ◮ Answers insensitive to m ??? ◮ Next: Better understanding of MCMC and estimation of m . ◮ Performance evaluations and wider applications

Estimation in Dirichlet Process Random Effects Models: Details of the Model [6] A Dirichlet Process Random Effects Model Estimating the Dirichlet Process Parameters ◮ A general random effects Dirichlet Process model can be written � ( Y 1 , . . . , Y n ) ∼ f ( y 1 , . . . , y n | θ, ψ 1 , . . . , ψ n ) = f ( y i | θ, ψ i ) i ⊲ ψ 1 , . . . , ψ n iid from G ∼ DP ⊲ DP is the Dirichlet Process ⊲ Base measure φ 0 and precision parameter m ⊲ The vector θ contains all model parameters ◮ Blackwell and MacQueen (1973) proved i − 1 � m 1 ψ i | ψ 1 , . . . , ψ i − 1 ∼ i − 1 + m φ 0 ( ψ i ) + δ ( ψ l = ψ i ) i − 1 + m l =1 ⊲ Where δ denotes the Dirac delta function.

Estimation in Dirichlet Process Random Effects Models: Details of the Model [7] Some Distributional Structure ◮ Freedman (1963), Ferguson (1973, 1974) and Antoniak (1974) ⊲ Dirichlet process prior for nonparametric G ⊲ Random probability measure on the space of all measures. ◮ Notation ⊲ G 0 , a base distribution (finite non-null measure) ⊲ m > 0, a precision parameter (finite and non-negative scalar) ⊲ Gives spread of distributions around G 0 , ⊲ Prior specification G ∼ DP ( m, G 0 ) ∈ P . ◮ For any finite partition of the parameter space, { B 1 , . . . , B K } , ( G ( B 1 ) , . . . , G ( B K )) ∼ D ( mG 0 ( B 1 ) , . . . , mG 0 ( B K )) ,

Estimation in Dirichlet Process Random Effects Models: Details of the Model [8] A Mixed Dirichlet Process Random Effects Model Likelihood Function ◮ The likelihood function is integrated over the random effects � L ( θ | y ) = f ( y 1 , . . . , y n | θ, ψ 1 , . . . , ψ n ) π ( ψ 1 , . . . , ψ n ) dψ 1 · · · dψ n ◮ From Lo (1984 Annals) Lemma 2 and Liu (1996 Annals)   � n k �  � � Γ( m ) m k  , L ( θ | y ) = f ( y ( j ) | θ, ψ j ) φ 0 ( ψ j ) dψ j Γ( n j ) Γ( m + n ) j =1 k =1 C : | C | = k ⊲ The partition C defines the subclusters ⊲ y ( j ) is the vector of y i s in subcluster j ⊲ ψ j is the common parameter for that subcluster

Estimation in Dirichlet Process Random Effects Models: Details of the Model [9] A Mixed Dirichlet Process Random Effects Model Matrix Representation of Partitions ◮ Start with the model Y | ψ ∼ N ( Xβ + ψ, σ 2 I ) , where ψ i ∼ DP ( m, N (0 , τ 2 )) , i = 1 , . . . , n ◮ With Likelihood Function   � n k �  � � Γ( m )  , m k L ( θ | y ) = Γ( n j ) f ( y ( j ) | θ, ψ j ) φ 0 ( ψ j ) dψ j Γ( m + n ) k =1 C : | C | = k j =1 ◮ Associate a binary matrix A n × k with a partition C   0 1 0 0 1 0     1 0 0   C = { S 1 , S 2 , S 3 } = {{ 3 , 4 , 6 } , { 1 , 2 } , { 5 }} ↔ A =   1 0 0   0 0 1 1 0 0

Estimation in Dirichlet Process Random Effects Models: Details of the Model [10] A Mixed Dirichlet Process Random Effects Model Matrix Representation of Partitions ◮ ψ = A η , η ∼ N k (0 , σ 2 I ) Y | A , η ∼ N ( Xβ + Aη, σ 2 I ) , η ∼ N k (0 , τ 2 I ) , ⊲ Rows: a i is a 1 × k vector of all zeros except for a 1 in its subcluster ⊲ Columns: The column sums of A are the number of observations in the groups ⊲ Variables: ψ i ∈ S j ⇒ ψ i = η j (constant in subclusters) ⊲ Monte Carlo: Only need to generate k normal random variables

Estimation in Dirichlet Process Random Effects Models: MCMC [11] MCMC Sampling Scheme Posterior Distribution ◮ The joint posterior distribution m k f ( y | θ, A ) π ( θ ) � π ( θ, A | y ) = � A m k f ( y | θ, A ) π ( θ ) dθ. Θ Model Random effects Dirichlet Process parameters Model parameters θ A : the subclusters → sampling is straightforward m : the precision parameter

Estimation in Dirichlet Process Random Effects Models: MCMC [12] MCMC Sampling Scheme Model Parameters and Dirichlet Process Parameters ◮ For t = 1 , . . . T , at iteration t ◮ Starting from ( θ ( t ) , A ( t ) ), Model Parameters θ ( t +1) ∼ π ( θ | A ( t ) , y ) , ◮ Given θ ( t +1) , A ( t +1) q ( t +1) ∼ Dirichlet( n ( t ) 1 + 1 , . . . , n ( t ) k + 1 , 1 , . . . , 1 ) � �� length n � � Dirichlet Process Parameters n � n A ( t +1) ∝ m k f ( y | θ ( t +1) , A ) [ q ( t +1) ] n j j n 1 · · · n n j =1 ◮ where n j ≥ 0, n 1 + · · · + n n = n .

Estimation in Dirichlet Process Random Effects Models: MCMC [13] MCMC Sampling Scheme Convergence of Dirichlet Process ◮ Neal (2000) describes 8 algorithms: All use “stick-breaking” conditionals Stick-breaking chain Our chain � � �  � � n j n j q j  n − 1+ m j = 1 , . . . , k j = 1 , . . ., k n − 1+ m n j +1 P ( a j = 1 | A − j ) ∝ P ( a j = 1 | A − j ) ∝  m n − 1+ m j = k + 1 m n − 1+ m q k +1 j = k + 1 , . . ., n ◮ Ours is a Parameter Expansion ◮ Parameter expansion dominates ◮ Var h ( Y ) is smaller for any square-integrable function h . (Liu/Wu 1999; vanDyk/Meng 2001; Hobert/Marchev 2008; Mira/ Geyer 1999; Mira, 2001)

Estimation in Dirichlet Process Random Effects Models: Scottish Election Data [14] Scottish Election Data - History Our Interest: 1997: Scottish voters overwhelmingly (74.3%) approved the creation of ◮ Who subsequently voted the first Scottish parliament conservative in Scotland? The Data: ◮ British General Election Study of 880 Scottish na- tionals The voters gave strong support, ◮ Outcome: party choice (63.5%), to granting this parliament (conservative or not) in UK taxation powers general election ◮ Independent variables: political and social measures ◮ Probit model

Estimation in Mixed Models with Dirichlet Process Random Effects - PowerPoint PPT Presentation

The Fourth Erich L. Lehmann Symposium May 9 - 12, 2011 Estimation in Mixed Models with Dirichlet Process Random Effects Both Sides of the Story George Casella Chen Li Department of Statistics Department of Statistics University of Florida

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Mixing it up with random effects Joshua Loftus Mixed models Intro to mixed models What is a

Regression 2: Mixed Models Marco Baroni Practical Statistics in R Outline Mixed models with

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Mixed models in R using the lme4 package Part 3: Linear mixed models with simple, scalar random

Why Mixed Effects Models? Mixed Effects Models Recap/Intro Three issues with ANOVA

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random

Outline Statistical inference for linear mixed models general form of linear mixed models

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Mixed Precision Training PAI Overview What is mixed-precision

Mixed Methodological Analysis David F. Feldon Utah State University May 8, 2018 Mixed Methods

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Overview of this module Course 02429 Analysis of correlated data: Mixed Linear Models Random

Course topics Modeling with random effects random effects linear mixed models Rasmus

STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

Reliable Modeling Using Interval Analysis: Chemical Engineering Applications Mark A. Stadtherr

Condition Numbers of Numeric and Algebraic Problems Stephen Vavasis 1 1 Department of

Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression

Data Mining / Intelligent Data Analysis Christian Borgelt Dept. of Mathematics / Dept. of

Exploiting sparsity in model matrices Douglas Bates and Martin Maechler Department of Statistics

Mixed-Integer Nonlinear Programming Ksenia Bestuzheva Zuse Institute Berlin CO@Work 2020

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018