Where do Multivariate Normal Samples Come from? Paul E. Johnson 1 2 - PowerPoint PPT Presentation

MVN 1 / 28 Where do Multivariate Normal Samples Come from? Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2017

MVN 2 / 28 What I learned on my Winter Vacation These notes accompany the longer essay That essay uses some L YX/L A T EX tricks that were new to me, such as the bold upright symbol for matrices in math. If you want to know how that’s done, I’ll share the L YX document to you. The next thing I want to learn is how to make transpose symbols look nicer

MVN 3 / 28 Overview: The 5 step process for manufacturing MVN The paper says a 5 step algorithm receives the user’s requested mean vector µ and a variance-covariance matrix Σ . 1 Calculate the eigen decomposition of Σ . 2 Check that Σ is positive definite by inspecting the eigenvalues. 1 If an eigenvalue is intolerably negative, terminate with an error message. 2 Tolerably negative eigenvalues are reset to 0. 3 Create a scaling matrix, S . The two programs differ in this stage. R uses the eigen decomposition while Stata uses Cholesky roots. 4 Create a candidate n × p matrix of random vectors by drawing from N (0 , 1) . 5 Apply y = µ + Sx to rescale the candidate random draws.

MVN 4 / 28 MVN x ∼ MV N ( µ , Σ ) is a draw from the multivariate normal distribution, MV N ( µ , Σ ) σ 2         x 1 µ 1 σ 12 σ 1 p 1 σ 2 x 2 µ 2 σ 12 σ 2 p 2         x =  ∼ MV N ( µ , Σ ) = MV N  ,  . . .       ...   . .         . .      σ 2 x p µ p σ 1 p σ 2 p p (1) The output from one draw is a vector of values, not a single number. (1 . 1 , 2 . 4 , − 3 , 2 . 2 . . . ) T (2) A data set might be a collection of those draws, each one representing a person’s answers on a survey.

MVN 5 / 28 Draw a sample from N ( µ, σ 2 ) A univariate“standard”normal draw x i ∼ N (0 , 1) Most programs can draw that. I could teach you where that comes from, lets don’t worry about it now. If you want a draw from N (5 , 100) , you calculate y i = 5 + 10 × x i (3) In words: multiply by the square root of the variance and add the desired expected value. for N ( µ, σ 2 ) y = µ + σ · x. (4) Little Puzzle. Suppose you want draws that appear as if they are N (0 , 9) . Is it correct to write √ x i = 0 + 9 u i ? (5) hint: The square root of 9 is not unique

MVN 6 / 28 Generalize that to MV N ( µ , Σ ) The process is going to be the same, except it is scaled up to draw a vector of candidate values filled with N (0 , 1) draws. y = µ + Sx . (6) p candidate values N (0 , 1) values are in x , multiply them by something, add something         y 1 µ 1 s 11 s 12 s 1 p x 1 y 2 µ 2 s 21 s 22 s 2 p x 2          =  +  .  .   .     .  ... . . .         . . .      y p µ p s p 1 s p 2 s pp x p The big puzzle is where do you get that matrix full of s ’s so that the resulting random draw has variance Σ . And what properties it should have.

MVN 7 / 28 Generalize that to MV N ( µ , Σ ) ... E [ y ] = E [ µ + Sx ] = µ + S E [ x ] . (7) And the variance matrix of y is V ar [ y ] = S V ar ( x ) S T . (8) Because V ar ( x ) = I and because S T S is symmetric, V ar [ y ] = SS T = S T S . (9) So if our goal is to generate y ∼ N ( µ, Σ ) , then we need Σ = SS T = S T S . In a sense, S is thus one kind of“square root”of a matrix.

MVN 8 / 28 If we are given a user’s requested , can we use that to create the matrix we need? Not always. There may be no matrix“square root” . User might supply Σ that’s not actually a covariance matrix. What do we know about Σ ? Symmetric, σ ij = σ ji Positive Definite (PD) or Positive Semidefinite (PSD). This is either esoteric and difficult to understand or simple and obvious. Here is my simple and obvious explanation. Here’s the formula for the MVN: 1 2 ( x − µ ) T Σ − 1 ( x − µ ) . − 1 f ( x ) = (2 π ) p/ 2 | Σ | 1 / 2 e (10)

MVN 9 / 28 If we are given a user’s requested , can we use that to create the matrix we need? ... The basic idea of the normal is that as a point moves away from the mean, as x − µ grows greater, we are less and less likely to observe that point. For this to be true, we need this to be positive. Otherwise, probability would grow as x − µ grows. ( x − µ ) T Σ − 1 ( x − µ ) > 0 . (11) This seems obvious to me. Suppose Σ = I , so this reduces to the (Pythagorean) distance between x and µ ( x − µ ) T ( x − µ ) = ( x 1 − µ 1 ) 2 + ( x 2 − µ 2 ) 2 + . . . + ( x p − µ p ) 2 (12) That’s a sum of squared things, it has to be 0 or greater. The Σ − 1 matrix puts weights on the deviations, but it should not be able to turn the distance between x and µ into a negative number. The closest two things can be is the same location. There’s a theorem that says if Σ is positive definite, then Σ − 1 is also PD

MVN 10 / 28 Check if a matrix is PD: Eigen Decomposition We insist that Σ is symmetric. We need a way to find out if it is PD. This turns out to be tricky because of numerical rounding error. In pencil and paper math, there is a theorem says that a matrix is PD if and only if all of its eigenvalues are positive. Recognizing that eigenvalues are approximated in computers, R and Stata use the same approach. A tolerance value is used as follows λ j < − tol | λ 1 | . (13) If λ j is so far negative that it is below − tol | λ 1 | , then we conclude the matrix is not PD.

MVN 11 / 28 Background info on the eigenvalues and eigenvectors The eigenvalues ( λ j ) and eigenvectors v j of Σ are paired in this weird way. = (14) Σv j λ j v j The vector v j that can be thought of as a rescaled thing coming out of Σ . Er, the scaling effect of Σ can be equaled by a simple multiplicative rescaling by λ j . Eigenvector matrix: collect the eigenvectors, side by side:   v 11 v 1 p . .   v 21 . v 2 p V = (15)       v p 1 v pp

MVN 12 / 28 Background info on the eigenvalues and eigenvectors ... Interesting property. The Eigenvalues are“orthogonal”to one another. v T 1 · v 2 = 0 (16) We usually scale eigenvectors these so that v T i · v i = 1 , Which implies this interesting property   1 0 0 0 0 0 1 0 0 0     V T V = I = 0 0 1 0 0 (17)     ...   0 0 0 0   0 0 0 0 1 And thus the inverse of the orthogonal matrix V is very easy to calculate: it is the transpose V T = V − 1 (18)

MVN 13 / 28 Background info on the eigenvalues and eigenvectors ... Anyway, given the vector of eigenvalues ( λ 1 , λ 2 , . . . , λ p ) , in pencil and paper math, λ j > 0 ∀ j . Numerically, we say if λ j is just a little bit negative, we will act as if that is roundoff error, and it is re-assigned as 0.

MVN 14 / 28 If all eigenvalues are greater than 0, Cholesky root Cholesky works for SQUARE, PD matrices only.  0 0 0 0    r 11 r 11 r 12 r 13 . . . r 1 p 0 0 0 r 12 r 22 r 22 r 23 r 2 p         0 0 r 13 r 23 r 33 r 33 Σ = R T × R = .         ... ...     0     r 1 p r 1 p r 3 p r pp r pp (19) If you require the diagonal elements are positive, this is a UNIQUE triangular square root We usually see people using the lower triangular part of the Cholesky as the scaling matrix S = R T (20) Other square roots exist, as we see next Recall Cholesky is not guaranteed to exist if Σ is not strictly PD. SPD is not sufficient.

MVN 15 / 28 If an eigenvalue is equal to 0, it is not PD, must use Eigen decomposition = V diag ( λ ) ΣV ΣVV T V diag ( λ ) V T = V diag ( λ ) V T . Σ = (21) The function diag places a vector’s values along the diagonal:   0 0 0 λ 1 0 λ 2 0 0   diag ( λ ) = (22)  .  ...    0 0 0  0 0 0 λ p

MVN 16 / 28 If an eigenvalue is equal to 0, it is not PD, must use Eigen decomposition ... Since λ j ≥ 0 , a real-valued square root of each eigenvalue exists, and we can write this as a product using diag ( λ ) 1 / 2 : √ λ 1 √ λ 1  0 0 0   0 0 0  √ λ 2 √ λ 2 0 0 0 0 0 0     diag ( λ ) =  = diag     ... ...     0 0 0 0 0 0    � � 0 0 0 0 0 0 λ p λ p (23) This allows us to revise (21) into a format that helps us to see that we have square root of Σ : V diag ( λ ) 1 / 2 diag ( λ ) 1 / 2 V T . Σ = V diag ( λ ) 1 / 2 ( V diag ( λ ) 1 / 2 ) T (24)

MVN 17 / 28 If an eigenvalue is equal to 0, it is not PD, must use Eigen decomposition ... The scaling matrix will be S = V diag ( λ ) 1 / 2 (25) because SS T = S T S = Σ .

Where do Multivariate Normal Samples Come from? Paul E. Johnson 1 2 - PowerPoint PPT Presentation

MVN 1 / 28 Where do Multivariate Normal Samples Come from? Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2017 MVN 2 / 28 What I learned on my Winter Vacation

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Come, Come Whoever You Are Come, Come, Whoever You Are Though youve broken your vows a

Linear regression How to measure the accuracy of linear regression models Linear Regression

MLES & Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde

Lecture 12 Gaussian Process Models 10/16/2018 1 Multivariate Normal Multivariate Normal

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Samples Advertising of samples and handing out samples Advertising Education and Assurance

-Samples [AB98] Hyp: domain S is a smooth curve or surface. S 1 -Samples [AB98] Hyp:

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical

Lecture 12 Gaussian Process Models Colin Rundel 02/27/2017 1 Multivariate Normal 2

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product

= = = 2 Further Var( X ) Var( ) Y a a a X =

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan,

PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, LOGISTIC REGRESSION

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

Nested logit models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

The Kalman Filter An Algorithm for Dealing with Uncertainty Steven Janke May 2011 Steven Janke

Data Classification Linear Classifier II Latent Differential Analysis Mean Classification

Where do Multivariate Normal Samples Come from? Paul E. Johnson 1 2 - PowerPoint PPT Presentation

MVN 1 / 28 Where do Multivariate Normal Samples Come from? Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2017 MVN 2 / 28 What I learned on my Winter Vacation

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Come, Come Whoever You Are Come, Come, Whoever You Are Though youve broken your vows a

Linear regression How to measure the accuracy of linear regression models Linear Regression

MLES &amp; Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde

Lecture 12 Gaussian Process Models 10/16/2018 1 Multivariate Normal Multivariate Normal

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Samples Advertising of samples and handing out samples Advertising Education and Assurance

-Samples [AB98] Hyp: domain S is a smooth curve or surface. S 1 -Samples [AB98] Hyp:

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical

Lecture 12 Gaussian Process Models Colin Rundel 02/27/2017 1 Multivariate Normal 2

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product

= = = 2 Further Var( X ) Var( ) Y a a a X =

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan,

PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, LOGISTIC REGRESSION

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

Nested logit models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

The Kalman Filter An Algorithm for Dealing with Uncertainty Steven Janke May 2011 Steven Janke

Data Classification Linear Classifier II Latent Differential Analysis Mean Classification

MLES & Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde