Wishart Distribution Max Turgeon STAT 7200Multivariate Statistics - PowerPoint PPT Presentation

Wishart Distribution Max Turgeon STAT 7200–Multivariate Statistics

Objectives • Understand the distribution of covariance matrices • Understand the distribution of the MLEs for the multivariate normal distribution • Understand the distribution of functionals of covariance matrices • Visualize covariance matrices and their distribution 2

Before we begin… i • In this section, we will discuss random matrices single vector. • Therefore, we will talk about distributions, derivatives and integrals over sets of matrices 3 • It can be useful to identify the space M n,p ( R ) of n × p matrices with R np . • We can defjne the function vec : M n,p ( R ) → R np that takes a matrix M and maps it to the np -dimensional vector given by concatenating the columns of M into a ( ) 1 3 vec = (1 , 2 , 3 , 4) . 2 4

Before we begin… ii • Another important observation: structural constraints (e.g. symmetry, positive defjniteness) reduce the number of “free” entries in a matrix and therefore the dimension of the subspace. diagonal, and the ofg-diagonal entries above the diagonal (or below). 4 • E.g. If A is a symmetric p × p matrix, there are only 1 2 p ( p + 1) independent entries: the entries on the

Wishart distribution i if we can write 5 • Let S be a random, positive semidefjnite matrix of dimension p × p . • We say S follows a standard Wishart distribution W p ( m ) m ∑ Z i Z T S = i , Z i ∼ N p (0 , I p ) indep. . i =1 • We say S follows a Wishart distribution W p ( m, Σ) with scale matrix Σ if we can write m ∑ Y i Y T S = Y i ∼ N p (0 , Σ) indep. . i , i =1

Wishart distribution ii 6 • We say S follows a non-central Wishart distribution W p ( m, Σ; ∆) with scale matrix Σ and non-centrality parameter ∆ if we can write m m ∑ Y i Y T ∑ µ i µ T S = Y i ∼ N p ( µ i , Σ) indep. , ∆ = i , i . i =1 i =1

7 Example i • Let S ∼ W p ( m ) be Wishart distributed, with scale matrix Σ = I p . ∑ m i =1 Z i Z T • We can therefore write S = i , with Z i ∼ N p (0 , I p ) .

Example ii • Using the properties of the trace, we have 8 ( m ) Z i Z T ∑ tr ( S ) = tr i i =1 m ( ) ∑ Z i Z T = tr i i =1 m ( ) ∑ Z T = tr i Z i i =1 m Z T ∑ = i Z i . i =1 • Recall that Z T i Z i ∼ χ 2 ( p ) .

Example iii B <- 1000 n <- 10; p <- 4 traces <- replicate (B, { Z <- matrix ( rnorm (n * p), ncol = p) W <- crossprod (Z) sum ( diag (W)) }) 9 • Therefore tr ( S ) is the sum of m independent copies of a χ 2 ( p ) , and so we have tr ( S ) ∼ χ 2 ( mp ) .

Example iv hist (traces, 50, freq = FALSE) lines ( density ( rchisq (B, df = n * p))) 10

Example v 11 Histogram of traces 0.05 0.04 0.03 Density 0.02 0.01 0.00 20 30 40 50 60 70 traces

Non-singular Wishart distribution i • As defjned above, there is no guarantee that a Wishart variate is invertible. 12 • To show : if S ∼ W p ( m, Σ) with Σ positive defjnite, S is invertible almost surely whenever m ≥ p . Lemma : Let Z be an n × n random matrix where the entries Z ij are iid N (0 , 1) . Then P (det Z = 0) = 0 . Proof : We will prove this by induction on n . If n = 1 , then the result hold since N (0 , 1) is absolutely continuous. Now let n > 1 and assume the result holds for n − 1 . Write

Non-singular Wishart distribution ii determinant formula, we have 13    Z 11 Z 12  , Z = Z 21 Z 22 where Z 22 is ( n − 1) × ( n − 1) . Note that by assumption, we have det Z 22 � = 0 almost surely. Now, by the Schur ( ) Z 11 − Z 12 Z − 1 det Z = det Z 22 det 22 Z 21 ( ) Z 11 − Z 12 Z − 1 = (det Z 22 ) 22 Z 21 .

Non-singular Wishart distribution iii We now have 14 P ( | Z | = 0) = P ( | Z | = 0 , | Z 22 | � = 0) + P ( | Z | = 0 , | Z 22 | = 0) = P ( | Z | = 0 , | Z 22 | � = 0) = P ( Z 11 = Z 12 Z − 1 22 Z 21 , | Z 22 | � = 0) ( ) P ( Z 11 = Z 12 Z − 1 = E 22 Z 21 , | Z 22 | � = 0 | Z 12 , Z 22 , Z 21 ) = E (0) = 0 ,

Non-singular Wishart distribution iv where we used the laws of total probability (Line 1) and total 15 induction. expectation (Line 4). Therefore, the result follows from We are now ready to prove the main result: let S ∼ W p ( m, Σ) with det Σ � = 0 , and write S = ∑ m i =1 Y i Y T i , with Y i ∼ N p (0 , Σ) . If we let Y be the m × p matrix whose i -th row is Y i . Then m ∑ Y i Y T i = Y T Y . S = i =1

Non-singular Wishart distribution v Now note that decomposition, then we can write Finally, we have 16 rank( S ) = rank( Y T Y ) = rank( Y ) . Furthermore, if we write Σ = LL T using the Cholesky Z = Y ( L − 1 ) T , where the rows Z i of Z are N p (0 , I p ) and rank( Z ) = rank( Y ) .

Non-singular Wishart distribution vi where the last equality follows from our Lemma. Since Defjnition we say it follows a singular Wishart distribution. 17 rank( S ) = rank( Z ) ≥ rank( Z 1 , . . . , Z p ) = p (a.s.) , rank( S ) = p almost surely, it is invertible almost surely. If S ∼ W p ( m, Σ) with Σ positive defjnite and m ≥ p , we say that S follows a nonsingular Wishart distribution. Otherwise,

Additional properties i 18 Let S ∼ W p ( m, Σ) . • We have E ( S ) = m Σ . • If B is a q × p matrix, we have BSB T ∼ W p ( m, B Σ B T ) . • If T ∼ W p ( n, Σ) , then S + T ∼ W p ( m + n, Σ) .

Additional properties ii 19 Now assume we can partition S and Σ as such:      S 11 S 12  Σ 11 Σ 12  ,  , S = Σ = S 21 S 22 Σ 21 Σ 22 with S ii and Σ ii of dimension p i × p i . We then have • S ii ∼ W p i ( m, Σ ii ) • If Σ 12 = 0 , then S 11 and S 22 are independent.

Characteristic function i • The defjnition of characteristic function can be extended to random matrices : is defjned as 20 • Let S be a p × p random matrix. The characteristic function of S evaluated at a p × p symmetric matrix T φ S ( T ) = E (exp( i tr( TS ))) . • We will show that if S ∼ W p ( m, Σ) , then φ S ( T ) = | I p − 2 i Σ T | − m/ 2 . • First, we will use the Cholesky decomposition Σ = LL T .

Characteristic function ii • Next, we can write decomposition: symmetric, and therefore we can compute its spectral 21   m Z j Z T  L T , ∑ S = L  j j =1 where Z j ∼ N p (0 , I p ) . • Now, fjx a symmetric matrix T . The matrix L T TL is also L T TL = U Λ U T , where Λ = diag( λ 1 , . . . , λ p ) is diagonal and UU T = I p .

Characteristic function iii • We can now write 22

Characteristic function iv 23     m  L T ∑ Z j Z T tr( TS ) = tr  TL   j j =1     m  U Λ U T ∑ Z j Z T = tr  j   j =1     m  Λ U T Z j Z T  U ∑ = tr  j  j =1     m ( U T Z j )( U T Z j ) T  . ∑ = tr  Λ   j =1

Characteristic function v • Putting all this together, we get • Two key observations: 24 • U T Z j ∼ N p (0 , I p ) ; ( ) = ∑ p Λ Z j Z T k =1 λ k Z 2 • tr j jk .     p m ∑ ∑ λ k Z 2 E (exp( i tr( TS ))) = E  exp  i jk   j =1 k =1 p m ( ( )) ∏ ∏ iλ k Z 2 = E exp . jk j =1 k =1

Characteristic function vi have 25 • But Z 2 jk ∼ χ 2 (1) , and so we have m p ∏ ∏ φ S ( T ) = φ χ 2 (1) ( λ k ) . j =1 k =1 • Recall that φ χ 2 (1) ( t ) = (1 − 2 it ) − 1 / 2 , and therefore we p m ∏ ∏ (1 − 2 iλ k ) − 1 / 2 . φ S ( T ) = j =1 k =1

Density of Wishart distribution where same result as before ( Exercise ). expression for the density and check that we obtain the • Proof : Compute the characteristic function using the 27 • Let S ∼ W p ( m, Σ) with Σ positive defjnite and m ≥ p . The density of S is given by 1 − 1 ( ) 2tr(Σ − 1 S ) | S | ( m − p − 1) / 2 , f ( S ) = 2 ) | Σ | m/ 2 exp 2 pm/ 2 Γ p ( m p − 1 u − i u > 1 ( ) Γ p ( u ) = π p ( p − 1) / 4 ∏ Γ , 2( p − 1) . 2 i =0

Sampling distribution of sample covariance • theorem • We will show that using the multivariate Cochran • In the multivariate case, we want to prove: • We are now ready to prove the results we stated a few 28 • Recall again the univariate case: • lectures ago. ( n − 1) s 2 ∼ χ 2 ( n − 1) ; σ 2 ¯ X and s 2 are independent. • ( n − 1) S n ∼ W p ( n − 1 , Σ) ; • ¯ Y and S n are independent.

29 Cochran theorem Let Y 1 , . . . , Y n be a random sample with Y i ∼ N p (0 , Σ) , and write Y for the n × p matrix whose i -th row is Y i . Let A, B be n × n symmetric matrices, and let C be a q × n matrix of rank q . Then 1. Y T A Y ∼ W p ( m, Σ) if and only if A 2 = A and tr A = m . 2. Y T A Y and Y T B Y are independent if and only if AB = 0 . 3. Y T A Y and C Y are independent if and only if CA = 0 .

Application i ones. • Then we have • We need to check the conditions of Cochran’s theorem: 30 • Let C = 1 n 1 T , where 1 is the n -dimensional vector of • Let A = I n − 1 n 11 T . C Y = ¯ Y T A Y = ( n − 1) S n , Y T . • A 2 = A ; • CA = 0 ; • tr A = n − 1 .

Wishart Distribution Max Turgeon STAT 7200Multivariate Statistics - PowerPoint PPT Presentation

Wishart Distribution Max Turgeon STAT 7200Multivariate Statistics Objectives Understand the distribution of covariance matrices Understand the distribution of the MLEs for the multivariate normal distribution Understand the

A CLT for Wishart Tensors Dan Mikulincer Weizmann Institute of Science 1 Wishart Tensors Let {

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Metabolomics: applications to food science & nutrition research Rupa Mandal/David Wishart,

Helping The Conservation Of Orkney Bere By Developing New Markets John Wishart and Peter Martin

On the product of a singular Wishart matrix and a singular Gaussian vector in high dimension.

Wishart Random Matrices for Uncertainty Quantification of Complex Dynamical Systems S Adhikari

GROUP WASTEWATER DIVISION VANESSA D. WISHART STAFFORD ROSENBAUM LLP 222 WEST WASHINGTON

Latent Wishart Processes for Relational Kernel Learning Wu-Jun Li Department of Computer Science

John Knox O God. Give me Scotland or I die! 1 Slide 2 Born, Haddington, c1505-1514? 2

Harmonic Means of Wishart Matrices Hi! Im Asad Lodhia, Im a postdoc at the University of

4.3 Normal distribution Prof. Tesler Math 186 Winter 2020 Prof. Tesler 4.3 Normal distribution

Distribution The definition of distribution Distribution of the subject-term Distribution of the

NOT FOR DISTRIBUTION OR RELEASE IN THE UNITED STATES NOT FOR DISTRIBUTION OR RELEASE IN THE UNITED

Broking Broking Distribution Depository services Distribution Depository services Distribution

Broking Broking Distribution Depository services Distribution Depository services Distribution

Glide Media Distribution Promotion : Distribution : Analytics For trackable print media

NYC Hep B Patient Navigation Programs NYC Health Department Nirah Johnson, LCSW Director,

Harvesting dispersed computational resources with Openstack a Cloud infrastructure for the

Decision Aid Methodologies In Transportation Lecture 1: Introduction Operations Research and its

Optimistic Fair Exchange Based on Publicly Verifiable Secret Sharing Gildas A VOINE & Serge V

Basic information Course Objectives: To learn about and discuss experimental and theoretical

From Trees to Webs Uprooting Knowledge through Visualization @scott_bot Trees of knowledge

Learning the Stress Patterns of the Worlds Languages Jeffrey Heinz heinz@udel.edu University

Mathematical Problems in Multivariate Public Key Cryptography Timothy Hodges University of

Wishart Distribution Max Turgeon STAT 7200Multivariate Statistics - PowerPoint PPT Presentation

Wishart Distribution Max Turgeon STAT 7200Multivariate Statistics Objectives Understand the distribution of covariance matrices Understand the distribution of the MLEs for the multivariate normal distribution Understand the

A CLT for Wishart Tensors Dan Mikulincer Weizmann Institute of Science 1 Wishart Tensors Let {

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Metabolomics: applications to food science &amp; nutrition research Rupa Mandal/David Wishart,

Helping The Conservation Of Orkney Bere By Developing New Markets John Wishart and Peter Martin

On the product of a singular Wishart matrix and a singular Gaussian vector in high dimension.

Wishart Random Matrices for Uncertainty Quantification of Complex Dynamical Systems S Adhikari

GROUP WASTEWATER DIVISION VANESSA D. WISHART STAFFORD ROSENBAUM LLP 222 WEST WASHINGTON

Latent Wishart Processes for Relational Kernel Learning Wu-Jun Li Department of Computer Science

John Knox O God. Give me Scotland or I die! 1 Slide 2 Born, Haddington, c1505-1514? 2

Harmonic Means of Wishart Matrices Hi! Im Asad Lodhia, Im a postdoc at the University of

4.3 Normal distribution Prof. Tesler Math 186 Winter 2020 Prof. Tesler 4.3 Normal distribution

Distribution The definition of distribution Distribution of the subject-term Distribution of the

NOT FOR DISTRIBUTION OR RELEASE IN THE UNITED STATES NOT FOR DISTRIBUTION OR RELEASE IN THE UNITED

Broking Broking Distribution Depository services Distribution Depository services Distribution

Broking Broking Distribution Depository services Distribution Depository services Distribution

Glide Media Distribution Promotion : Distribution : Analytics For trackable print media

NYC Hep B Patient Navigation Programs NYC Health Department Nirah Johnson, LCSW Director,

Harvesting dispersed computational resources with Openstack a Cloud infrastructure for the

Decision Aid Methodologies In Transportation Lecture 1: Introduction Operations Research and its

Optimistic Fair Exchange Based on Publicly Verifiable Secret Sharing Gildas A VOINE &amp; Serge V

Basic information Course Objectives: To learn about and discuss experimental and theoretical

From Trees to Webs Uprooting Knowledge through Visualization @scott_bot Trees of knowledge

Learning the Stress Patterns of the Worlds Languages Jeffrey Heinz heinz@udel.edu University

Mathematical Problems in Multivariate Public Key Cryptography Timothy Hodges University of

Metabolomics: applications to food science & nutrition research Rupa Mandal/David Wishart,

Optimistic Fair Exchange Based on Publicly Verifiable Secret Sharing Gildas A VOINE & Serge V