stat 5101 lecture slides deck 8 dirichlet distribution
play

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. - PowerPoint PPT Presentation

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics University of Minnesota This work is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License ( http://creativecommons.org/


  1. Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics University of Minnesota This work is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License ( http://creativecommons.org/ licenses/by-sa/4.0/ ). 1

  2. The Dirichlet Distribution The Dirichlet Distribution is to the beta distribution as the multi- nomial distribution is to the binomial distribution. We get it by the same process that we got to the beta distribu- tion (slides 128–137, deck 3), only multivariate. Recall the basic theorem about gamma and beta (same slides referenced above). 2

  3. The Dirichlet Distribution (cont.) Theorem 1. Suppose X and Y are independent gamma random variables X ∼ Gam( α 1 , λ ) Y ∼ Gam( α 2 , λ ) then U = X + Y V = X/ ( X + Y ) are independent random variables and U ∼ Gam( α 1 + α 2 , λ ) V ∼ Beta( α 1 , α 2 ) 3

  4. The Dirichlet Distribution (cont.) Corollary 1. Suppose X 1 , X 2 , . . . , are are independent gamma random variables with the same shape parameters X i ∼ Gam( α i , λ ) then the following random variables X 1 ∼ Beta( α 1 , α 2 ) X 1 + X 2 X 1 + X 2 ∼ Beta( α 1 + α 2 , α 3 ) X 1 + X 2 + X 3 . . . X 1 + · · · + X d − 1 ∼ Beta( α 1 + · · · + α d − 1 , α d ) X 1 + · · · + X d are independent and have the asserted distributions. 4

  5. The Dirichlet Distribution (cont.) From the first assertion of the theorem we know X 1 + · · · + X k − 1 ∼ Gam( α 1 + · · · + α k − 1 , λ ) and is independent of X k . Thus the second assertion of the theorem says X 1 + · · · + X k − 1 ∼ Beta( α 1 + · · · + α k − 1 , α k ) ( ∗ ) X 1 + · · · + X k and ( ∗ ) is independent of X 1 + · · · + X k . That proves the corollary. 5

  6. The Dirichlet Distribution (cont.) Theorem 2. Suppose X 1 , X 2 , . . . , are as in the Corollary. Then the random variables X i Y i = X 1 + . . . + X d satisfy d � Y i = 1 , almost surely . i =1 and the joint density of Y 2 , . . . , Y d is d f ( y 2 , . . . , y d ) = Γ( α 1 + · · · + α d ) y α i − 1 Γ( α 1 ) · · · Γ( α d ) (1 − y 2 − · · · − y d ) α 1 − 1 � i i =2 The Dirichlet distribution with parameter vector ( α 1 , . . . , α d ) is the distribution of the random vector ( Y 1 , . . . , Y d ). 6

  7. The Dirichlet Distribution (cont.) Let the random variables in Corollary 1 be denoted W 2 , . . . , W d so these are independent and W i = X 1 + · · · + X i − 1 ∼ Beta( α 1 + · · · + α i − 1 , α i ) X 1 + · · · + X i Then d � Y i = (1 − W i ) W j , i = 2 , . . . , d, j = i +1 where in the case i = d we use the convention that the product is empty and equal to one. 7

  8. The Dirichlet Distribution (cont.) X i Y i = X 1 + . . . + X d W i = X 1 + · · · + X i − 1 X 1 + · · · + X i The inverse transformation is W i = Y 1 + · · · + Y i − 1 1 − Y i − · · · − Y d = , i = 2 , . . . , d, Y 1 + · · · + Y i 1 − Y i +1 − · · · − Y d where in the case i = d we use the convention that the the sum in the denominator of the fraction on the right is empty and equal to zero, so the denominator itself is equal to one. 8

  9. The Dirichlet Distribution (cont.) 1 − y i − · · · − y d w i = 1 − y i +1 − · · · − y d This transformation has components of the Jacobian matrix 1 ∂w i = − ∂y i 1 − y i +1 − · · · − y d ∂w i = 0 , j < i ∂y j 1 1 − y i − · · · − y d ∂w i = − + (1 − y i +1 − · · · − y d ) 2 , j > i ∂y j 1 − y i +1 − · · · − y d 9

  10. The Dirichlet Distribution (cont.) Since this Jacobian matrix is triangular, the determinant is the product of the diagonal elements d − 1 1 � | det ∇ h ( y 2 , . . . , y d ) | = . 1 − y i +1 − · · · − y d i =2 10

  11. The Dirichlet Distribution (cont.) The joint density of W 2 , . . . , W d is d Γ( α 1 + · · · + α i ) Γ( α 1 + · · · + α i − 1 )Γ( α i ) w α 1 + ··· + α i − 1 − 1 (1 − w i ) α i − 1 � i i =2 d = Γ( α 1 + · · · + α d ) w α 1 + ··· + α i − 1 − 1 (1 − w i ) α i − 1 � i Γ( α 1 ) · · · Γ( α d ) i =2 11

  12. The Dirichlet Distribution (cont.) i =2 w α 1 + ··· + α i − 1 − 1 Γ( α 1 + ··· + α d ) � d (1 − w i ) α i − 1 PMF of W ’s i Γ( α 1 ) ··· Γ( α d ) � d − 1 1 Jacobian i =2 1 − y i +1 −···− y d 1 − y i −···− y d transformation w i = 1 − y i +1 −···− y d The PMF of Y 2 , . . . , Y d is (1 − y i − · · · − y d ) α 1 + ··· + α i − 1 − 1 y α i − 1 d Γ( α 1 + · · · + α d ) i � (1 − y i +1 − · · · − y d ) α 1 + ··· + α i − 1 Γ( α 1 ) · · · Γ( α d ) i =2 d = Γ( α 1 + · · · + α d ) y α i − 1 Γ( α 1 ) · · · Γ( α d ) (1 − y 2 − · · · − y d ) α 1 − 1 � i i =2 12

  13. Univariate Marginals Write I = { 1 , . . . , d } . By definition X i Y i = X 1 + . . . + X d has distribution the beta distribution with parameters α i and � α j j ∈ I j � = i by Theorem 1 because X i ∼ Gam( α i , λ )     � �   X j ∼ Gam α j , λ     j ∈ I j ∈ I   j � = i j � = i 13

  14. Multivariate Marginals Multivariate Marginals are “almost” Dirichlet. As was the case with the multinomial, if we collapse categories, we get a Dirichlet. Let A be a partition of I , and define � Z A = Y i , A ∈ A . i ∈ A � β A = α i , A ∈ A . i ∈ A Then the random vector having components Z A has the Dirichlet distribution with parameters β A . 14

  15. Conditionals X i Y i = X 1 + . . . + X d X i · X 1 + · · · + X k Y i = X 1 + · · · + X k X 1 + · · · + X d X i = · ( Y 1 + · · · + Y k ) X 1 + · · · + X k X i = · (1 − Y k +1 − · · · − Y d ) X 1 + · · · + X k 15

  16. Conditionals (cont.) X i Y i = · (1 − Y k +1 − · · · − Y d ) X 1 + · · · + X k When we condition on Y k +1 , . . . , Y d , the second term above is a constant and the first term a component of another Dirichlet random vector having components X i Z i = , i = 1 , . . . , k X 1 + · · · + X k So conditionals of Dirichlet are constant times Dirichlet. 16

  17. Moments From the marginals being beta, we have α i E ( Y i ) = α 1 + · · · + α d α i � var( Y i ) = α j ( α 1 + · · · + α d ) 2 ( α 1 + · · · + α d + 1) j ∈ I j � = i 17

  18. Moments (cont.) From the PMF we get the “theorem associated with the Dirichlet distribution.”   d � � y α i − 1 (1 − y 2 − · · · − y d ) α 1 − 1  dy 2 · · · dy d � · · ·  i i =2 = Γ( α 1 ) · · · Γ( α d ) Γ( α 1 + · · · + α d ) so E ( Y 1 Y 2 ) = Γ( α 1 + 1)Γ( α 2 + 1)Γ( α 3 ) · · · Γ( α d ) · Γ( α 1 + · · · + α d ) Γ( α 1 + · · · + α d + 2) Γ( α 1 ) · · · Γ( α d ) α 1 α 2 = ( α 1 + · · · + α d + 1)( α 1 + · · · + α d ) 18

  19. Moments (cont.) The result on the preceding slide holds when 1 and 2 are replaced by i and j for i � = j , and cov( Y i , Y j ) = E ( Y i Y j ) − E ( Y i ) E ( Y j ) α i α j α i α j = ( α 1 + · · · + α d + 1)( α 1 + · · · + α d ) − ( α 1 + · · · + α d ) 2 � � α i α j 1 1 = α 1 + · · · + α d + 1 − α 1 + · · · + α d α 1 + · · · + α d α i α j = − ( α 1 + · · · + α d ) 2 ( α 1 + · · · + α d + 1) 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend