finite mixture models
play

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State - PowerPoint PPT Presentation

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 1 / 27 Multinomial distribution Categorical distribution Let Z Cat ( H, p )


  1. Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 1 / 27

  2. Multinomial distribution Categorical distribution Let Z ∼ Cat ( H, p ) represent a categorical distribution with P ( Z = h ) = p h for h = 1 , . . . , H and � H h =1 p h = 1 . Example: discrete choice model Suppose we have a set of H categories and we label these 1 , . . . , H . Independently, consumers choose one of the H categories with the same ind probability. Then a reasonable model is Z i ∼ Cat ( H, p ) . Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 2 / 27

  3. Multinomial distribution Multinomial distribution If we count the number of times the consumer chose each category, i.e. n � Y h = I( Z i = h ) , i =1 then the result is the multinomial distribution, i.e. Y ∼ Mult ( n, p ) . The multinomial distribution has probability mass function H n ! � p y h p ( y ; n, p ) = h y 1 ! · · · y H ! h =1 which has E [ Y i ] = np i , V [ Y i ] = np i (1 − p i ) , and Cov [ Y i , Y j ] = − np i p j for ( i � = j ) . A special case is H = 2 which is the binomial distribution. Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 3 / 27

  4. Dirichlet distribution Dirichlet distribution The Dirichlet distribution (named after Peter Gustav Lejeune Dirichlet), i.e. P ∼ Dir ( a ) , is a probability distribution for a probability vector of length H . The probability density function for the Dirichlet distribution is H p ( P ; a ) = Γ( a 1 + · · · + a H ) p a h − 1 � h Γ( a 1 ) · · · Γ( a H ) h =1 where p h ≥ 0 , � H h =1 p h = 1 , and a h > 0 . Letting a 0 = � H h =1 a h , then some moments are E [ p h ] = a h a 0 , V [ p h ] = a h ( a 0 − a h ) 0 ( a 0 +1) , a 2 a h a k Cov ( p h , p k ) = − 0 ( a 0 +1) , and a 2 mode ( p h ) = a h − 1 a 0 − H for a h > 1 . A special case is H = 2 which is the beta distribution. Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 4 / 27

  5. Dirichlet distribution Conjugate prior for multinomial distribution Conjugate prior for multinomial distribution The Dirichlet distribution is the natural conjugate prior for the multinomial distribution. If Y ∼ Mult ( n, π ) and π ∼ Dir ( a ) then π | y ∼ Dir ( a + y ) . Some possible default priors are a = 1 which is the uniform density over π , a = 1 / 2 is Jeffreys prior for the multinomial, and a = 0 , an improper prior that is uniform on log( π h ) . The resulting posterior is proper if y h > 0 for all h . Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 5 / 27

  6. Finite mixtures Complicated distributions 10.0 10.0 10.0 10.0 3000 7.5 7.5 7.5 7.5 count 5.0 5.0 2000 5.0 5.0 φ φ φ φ 2.5 2.5 2.5 2.5 1000 0.0 0.0 0.0 0.0 0 −2.5 −2.5 −2.5 −2.5 0 4 8 −1 0 1 −3 −2 −1 0 1 −1 0 1 2 3 φ −2 −1 0 1 2 3 γ ψ α δ 20000 3 3 3 2 2 2 15000 1 1 count 1 α α α 10000 0 0 0 5000 −1 −1 −1 −2 −2 −2 0 −1 0 1 −3 −2 −1 0 1 −2 0 2 −1 0 1 2 3 γ ψ α δ 20000 3 3 15000 2 2 count 10000 δ 1 δ 1 0 0 5000 −1 −1 0 −1 0 1 −3 −2 −1 0 1 −1 0 1 2 3 γ ψ δ 20000 1 15000 count γ 0 10000 5000 −1 0 −1 0 1 −3 −2 −1 0 1 γ ψ 6000 4000 count 2000 0 −3 −2 −1 0 1 ψ Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 6 / 27

  7. Finite mixtures Finite mixtures Let’s focus on modeling the univariate distribution for φ 0.20 0.15 density 0.10 0.05 0.00 0 4 8 phi Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 7 / 27

  8. Finite mixtures Finite mixture Finite mixture A model for the marginal distribution for Y i = φ i is H ind � π h N ( µ h , σ 2 Y i ∼ h ) h =1 where � H h =1 π h = 1 . Alternatively, we can introduce a latent variable ζ i = h if observation i came from group h . Then ind ∼ N ( µ z , σ 2 Y i | ζ i = z z ) ind ζ i ∼ Cat ( H, π ) where ζ ∼ Cat ( H, π ) is a categorical random variable with P ( ζ = h ) = π h for h = 1 , . . . , H and π = ( π 1 , . . . , π H ) . Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 8 / 27

  9. Finite mixtures Finite mixture A possible prior Let’s assume π ∼ Dir ( a ) ind µ h | σ 2 ∼ N ( m h , v 2 h σ 2 h ) h ind σ 2 ∼ IG ( c h , d h ) h and π is independent of µ = ( µ 1 , . . . , µ H ) and σ 2 = ( σ 2 1 , . . . , σ 2 H ) . Commonly, we have m h = m , v h = v , c h = c , and d h = d . If the data have been standardized (scaled and centered), a reasonable default prior is m = 0 , v = 1 , c = 2 , d = 4 , (BDA3 pg 535) and a is 1 /H (BDA3 pg 536). Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 9 / 27

  10. Finite mixtures Finite mixture MCMC The steps of a Gibbs sampler with stationary distribution p ( π, µ, σ 2 , ζ | y ) ∝ p ( y | ζ, µ, σ 2 ) p ( ζ | π ) p ( µ | σ 2 ) p ( σ 2 ) p ( π ) has steps 1. For i = 1 , . . . , n , sample ζ i from its full conditional (as the ζ are conditionally independent across i ): P ( ζ i = h | . . . ) ∝ π h N ( y i ; µ h , σ 2 h ) Jointly sample π and µ, σ 2 as they are conditionally independent. 2. Sample π ∼ Dir ( a + Z ) where n = ( Z 1 , . . . , Z H ) and Z h = � n a. i =1 I( ζ i = h ) . For h = 1 , . . . , H , sample µ h , σ 2 b. h from their full conditional (as these are conditionally independent across h ): ind ind σ 2 ∼ IG ( c ′ h , d ′ µ h | σ 2 ∼ N ( m ′ h , v ′ 2 h σ 2 h ) h ) h h where v ′ 2 = (1 /v 2 h + Z h ) − 1 h m ′ = v ′ 2 h ( m h /v 2 h + Z h y h ) h c ′ = c d + Z h / 2 h � � � � i : ζi = h ( y i − y h ) 2 + Zh d ′ = d h + 1 ( y h − m h ) 2 � h 2 1+ Zh/v 2 h 1 � y h = i : ζi = h y i Zh Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 10 / 27

  11. Finite mixtures JAGS library("rjags") jags_model = " model { for (i in 1:n) { y[i] ~ dnorm(mu[zeta[i]], tau[zeta[i]]) zeta[i] ~ dcat(pi[]) } for (i in 1:H) { mu[i] ~ dnorm(0,1e-5) tau[i] ~ dgamma(1,1) sigma[i] <- 1/sqrt(tau[i]) } pi ~ ddirich(a) } " tmp = hat[sample(nrow(hat), 1000),] dat = list(n=nrow(tmp), H=3, y=tmp$phi, a=rep(1,3)) jm = jags.model(textConnection(jags_model), data = dat, n.chains = 3) r = coda.samples(jm, c('mu','sigma','pi'), 1e3) Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 11 / 27

  12. Finite mixtures JAGS Convergence diagnostics gelman.diag(r) ## Error in chol.default(W): the leading minor of order 6 is not positive definite gelman.diag(r, multivariate=FALSE) ## Potential scale reduction factors: ## ## Point est. Upper C.I. ## mu[1] 21.76 55.54 ## mu[2] 41.55 90.41 ## mu[3] 91.54 222.09 ## pi[1] 11.79 29.52 ## pi[2] 3.46 6.36 ## pi[3] 18.63 40.16 ## sigma[1] 13.70 35.60 ## sigma[2] 4.26 8.87 ## sigma[3] 13.93 37.09 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 12 / 27

  13. Finite mixtures JAGS Convergence diagnostics (2) plot(r, density=FALSE) Trace of mu[1] Trace of mu[2] Trace of mu[3] 6 6 1 3 2 −2 −2 0 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 Iterations Iterations Iterations Trace of pi[1] Trace of pi[2] Trace of pi[3] 0.4 0.4 0.4 0.1 0.1 0.1 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 Iterations Iterations Iterations Trace of sigma[1] Trace of sigma[2] Trace of sigma[3] 2.5 2.0 2.0 1.0 0.0 0.0 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 Iterations Iterations Iterations Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 13 / 27

  14. Finite mixtures JAGS Prior distributions The parameters of the model are unidentified due to label-switching, i.e. H H ind h ) d � π h N ( µ h , σ 2 � π h ′ N ( µ h ′ , σ 2 Y i ∼ = h ′ ) h ′ =1 h =1 for some permutation h ′ . One way to resolve this issue is to enforce identifiability in the prior. For example, in one-dimension, we can order the component means: µ 1 < µ 2 < · · · < µ H . To ensure the posterior is proper Maintain proper prior for π Ensure proper prior for ratios of variances (perhaps by ensuring prior is proper for variances themselves) Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 14 / 27

  15. Finite mixtures JAGS Two conditionally conjugate prior options Option 1: H � N ( µ h ; m h , v 2 ) IG ( σ 2 Dir ( π ; a )I( µ 1 < · · · < µ H ) h ; c h , d h ) h h =1 Option 2: 1 � N ( µ h ; m h , v 2 h σ 2 h ) IG ( σ 2 Dir ( π ; a )I( µ 1 < · · · < µ H ) h ; c h , d h ) h =0 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 15 / 27

  16. Finite mixtures JAGS library("rjags") jags_model = " model { for (i in 1:n) { y[i] ~ dnorm(mu[zeta[i]], tau[zeta[i]]) zeta[i] ~ dcat(pi[]) } for (i in 1:H) { mu0[i] ~ dnorm(0,1e-5) tau[i] ~ dgamma(1,1) sigma[i] <- 1/sqrt(tau[i]) } mu[1:H] <- sort(mu0) pi ~ ddirich(a) } " jm = jags.model(textConnection(jags_model), data = dat, n.chains = 3) r = coda.samples(jm, c('mu','sigma','pi'), 1e3) Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 16 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend