Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. - PowerPoint PPT Presentation

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics University of Minnesota This work is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License ( http://creativecommons.org/ licenses/by-sa/4.0/ ). 1

The Dirichlet Distribution The Dirichlet Distribution is to the beta distribution as the multinomial distribution is to the binomial distribution. We get it by the same process that we got to the beta distribution (slides 128–137, deck 3), only multivariate. Recall the basic theorem about gamma and beta (same slides referenced above). 2

The Dirichlet Distribution (cont.) Theorem 1. Suppose X and Y are independent gamma random variables X ∼ Gam( α 1 , λ ) Y ∼ Gam( α 2 , λ ) then U = X + Y V = X/ ( X + Y ) are independent random variables and U ∼ Gam( α 1 + α 2 , λ ) V ∼ Beta( α 1 , α 2 ) 3

The Dirichlet Distribution (cont.) Corollary 1. Suppose X 1 , X 2 , . . . , are are independent gamma random variables with the same shape parameters X i ∼ Gam( α i , λ ) then the following random variables X 1 ∼ Beta( α 1 , α 2 ) X 1 + X 2 X 1 + X 2 ∼ Beta( α 1 + α 2 , α 3 ) X 1 + X 2 + X 3 . . . X 1 + · · · + X d − 1 ∼ Beta( α 1 + · · · + α d − 1 , α d ) X 1 + · · · + X d are independent and have the asserted distributions. 4

The Dirichlet Distribution (cont.) From the first assertion of the theorem we know X 1 + · · · + X k − 1 ∼ Gam( α 1 + · · · + α k − 1 , λ ) and is independent of X k . Thus the second assertion of the theorem says X 1 + · · · + X k − 1 ∼ Beta( α 1 + · · · + α k − 1 , α k ) ( ∗ ) X 1 + · · · + X k and ( ∗ ) is independent of X 1 + · · · + X k . That proves the corollary. 5

The Dirichlet Distribution (cont.) Theorem 2. Suppose X 1 , X 2 , . . . , are as in the Corollary. Then the random variables X i Y i = X 1 + . . . + X d satisfy d � Y i = 1 , almost surely . i =1 and the joint density of Y 2 , . . . , Y d is d f ( y 2 , . . . , y d ) = Γ( α 1 + · · · + α d ) y α i − 1 Γ( α 1 ) · · · Γ( α d ) (1 − y 2 − · · · − y d ) α 1 − 1 � i i =2 The Dirichlet distribution with parameter vector ( α 1 , . . . , α d ) is the distribution of the random vector ( Y 1 , . . . , Y d ). 6

The Dirichlet Distribution (cont.) Let the random variables in Corollary 1 be denoted W 2 , . . . , W d so these are independent and W i = X 1 + · · · + X i − 1 ∼ Beta( α 1 + · · · + α i − 1 , α i ) X 1 + · · · + X i Then d � Y i = (1 − W i ) W j , i = 2 , . . . , d, j = i +1 where in the case i = d we use the convention that the product is empty and equal to one. 7

The Dirichlet Distribution (cont.) X i Y i = X 1 + . . . + X d W i = X 1 + · · · + X i − 1 X 1 + · · · + X i The inverse transformation is W i = Y 1 + · · · + Y i − 1 1 − Y i − · · · − Y d = , i = 2 , . . . , d, Y 1 + · · · + Y i 1 − Y i +1 − · · · − Y d where in the case i = d we use the convention that the the sum in the denominator of the fraction on the right is empty and equal to zero, so the denominator itself is equal to one. 8

The Dirichlet Distribution (cont.) 1 − y i − · · · − y d w i = 1 − y i +1 − · · · − y d This transformation has components of the Jacobian matrix 1 ∂w i = − ∂y i 1 − y i +1 − · · · − y d ∂w i = 0 , j < i ∂y j 1 1 − y i − · · · − y d ∂w i = − + (1 − y i +1 − · · · − y d ) 2 , j > i ∂y j 1 − y i +1 − · · · − y d 9

The Dirichlet Distribution (cont.) Since this Jacobian matrix is triangular, the determinant is the product of the diagonal elements d − 1 1 � | det ∇ h ( y 2 , . . . , y d ) | = . 1 − y i +1 − · · · − y d i =2 10

The Dirichlet Distribution (cont.) The joint density of W 2 , . . . , W d is d Γ( α 1 + · · · + α i ) Γ( α 1 + · · · + α i − 1 )Γ( α i ) w α 1 + ··· + α i − 1 − 1 (1 − w i ) α i − 1 � i i =2 d = Γ( α 1 + · · · + α d ) w α 1 + ··· + α i − 1 − 1 (1 − w i ) α i − 1 � i Γ( α 1 ) · · · Γ( α d ) i =2 11

The Dirichlet Distribution (cont.) i =2 w α 1 + ··· + α i − 1 − 1 Γ( α 1 + ··· + α d ) � d (1 − w i ) α i − 1 PMF of W ’s i Γ( α 1 ) ··· Γ( α d ) � d − 1 1 Jacobian i =2 1 − y i +1 −···− y d 1 − y i −···− y d transformation w i = 1 − y i +1 −···− y d The PMF of Y 2 , . . . , Y d is (1 − y i − · · · − y d ) α 1 + ··· + α i − 1 − 1 y α i − 1 d Γ( α 1 + · · · + α d ) i � (1 − y i +1 − · · · − y d ) α 1 + ··· + α i − 1 Γ( α 1 ) · · · Γ( α d ) i =2 d = Γ( α 1 + · · · + α d ) y α i − 1 Γ( α 1 ) · · · Γ( α d ) (1 − y 2 − · · · − y d ) α 1 − 1 � i i =2 12

Univariate Marginals Write I = { 1 , . . . , d } . By definition X i Y i = X 1 + . . . + X d has distribution the beta distribution with parameters α i and � α j j ∈ I j � = i by Theorem 1 because X i ∼ Gam( α i , λ )     � �   X j ∼ Gam α j , λ     j ∈ I j ∈ I   j � = i j � = i 13

Multivariate Marginals Multivariate Marginals are “almost” Dirichlet. As was the case with the multinomial, if we collapse categories, we get a Dirichlet. Let A be a partition of I , and define � Z A = Y i , A ∈ A . i ∈ A � β A = α i , A ∈ A . i ∈ A Then the random vector having components Z A has the Dirichlet distribution with parameters β A . 14

Conditionals X i Y i = X 1 + . . . + X d X i · X 1 + · · · + X k Y i = X 1 + · · · + X k X 1 + · · · + X d X i = · ( Y 1 + · · · + Y k ) X 1 + · · · + X k X i = · (1 − Y k +1 − · · · − Y d ) X 1 + · · · + X k 15

Conditionals (cont.) X i Y i = · (1 − Y k +1 − · · · − Y d ) X 1 + · · · + X k When we condition on Y k +1 , . . . , Y d , the second term above is a constant and the first term a component of another Dirichlet random vector having components X i Z i = , i = 1 , . . . , k X 1 + · · · + X k So conditionals of Dirichlet are constant times Dirichlet. 16

Moments From the marginals being beta, we have α i E ( Y i ) = α 1 + · · · + α d α i � var( Y i ) = α j ( α 1 + · · · + α d ) 2 ( α 1 + · · · + α d + 1) j ∈ I j � = i 17

Moments (cont.) From the PMF we get the “theorem associated with the Dirichlet distribution.”   d � � y α i − 1 (1 − y 2 − · · · − y d ) α 1 − 1  dy 2 · · · dy d � · · ·  i i =2 = Γ( α 1 ) · · · Γ( α d ) Γ( α 1 + · · · + α d ) so E ( Y 1 Y 2 ) = Γ( α 1 + 1)Γ( α 2 + 1)Γ( α 3 ) · · · Γ( α d ) · Γ( α 1 + · · · + α d ) Γ( α 1 + · · · + α d + 2) Γ( α 1 ) · · · Γ( α d ) α 1 α 2 = ( α 1 + · · · + α d + 1)( α 1 + · · · + α d ) 18

Moments (cont.) The result on the preceding slide holds when 1 and 2 are replaced by i and j for i � = j , and cov( Y i , Y j ) = E ( Y i Y j ) − E ( Y i ) E ( Y j ) α i α j α i α j = ( α 1 + · · · + α d + 1)( α 1 + · · · + α d ) − ( α 1 + · · · + α d ) 2 � � α i α j 1 1 = α 1 + · · · + α d + 1 − α 1 + · · · + α d α 1 + · · · + α d α i α j = − ( α 1 + · · · + α d ) 2 ( α 1 + · · · + α d + 1) 19

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. - PowerPoint PPT Presentation

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics University of Minnesota This work is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License ( http://creativecommons.org/

Stat 5101 Lecture Slides Deck 5 Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Slides Deck 4 Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Slides Deck 5 Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Slides: Deck 4 Quantiles and Best Prediction Charles J. Geyer School of

Stat 5101 Lecture Slides: Deck 5 Conditional Probability and Expectation, Poisson Process,

Stat 5101 Lecture Slides Deck 1 Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Slides Deck 7 Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Slides Deck 6 Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Slides: Deck 1 Probability and Expectation on Finite Sample Spaces Charles J.

Stat 5101 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Slides: Deck 6 Existence of Integrals and Infinite Sums, Countable Additivity

Stat 5101 Lecture Slides Deck 6 Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory Charles J. Geyer

Stat 5101 Lecture Slides Deck 1 Charles J. Geyer School of Statistics University of Minnesota

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Why and how to use random forest Introduction Construction R functions variable importance

Lecture 15 : Pairs of Discrete Random Variables 0/ 21 Today we start Chapter 5. The transition we

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

I use Blue Waters to simulate an ultracold inferno. Micheline Soley Micheline Soley, Harvard

OBJECT ORIENTED PROGRAMMING Coin.java and CoinTester.java This excellent tutorial written by

Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian

Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst cole Polytechnique

Lecture 2. Random Matrix Theory and Phase Transitions of PCA Yuan Yao Hong Kong University of

Sambuz

Useful Links

Newsletter

Mail Us