Gaussian Quiz H 1 y 2 y 1 y 3 1 . Assuming that the variables y 1 , y - PDF document

Preamble to ‘The Humble Gaussian Distribution’. David MacKay 1 Gaussian Quiz H 1 y 2 y 1 y 3 1 . Assuming that the variables y 1 , y 2 , y 3 in this belief network have a joint Gaussian distribution, which of the following matrices could be the covariance matrix? A B C D         9 3 1 8 − 3 1 9 3 0 9 − 3 0 3 9 3 − 3 9 − 3 3 9 3 − 3 10 − 3                 1 3 9 1 − 3 8 0 3 9 0 − 3 9 2 . Which of the matrices could be the inverse covariance matrix? H 2 y 1 y 3 y 2 3 . Which of the matrices could be the covariance matrix of the second graphical model? 4 . Which of the matrices could be the inverse covariance matrix of the second graphical model? 5 . Let three variables y 1 , y 2 , y 3 have covariance matrix K (3) , and inverse covariance matrix K − 1 (3) .  1 . 5 0   1 . 5 − 1 . 5  K − 1 K (3) = . 5 1 . 5 = − 1 2 − 1     (3)     0 . 5 1 . 5 − 1 1 . 5 Now focus on the variables y 1 and y 2 . Which statements about their covariance matrix K (2) and inverse covariance matrix K − 1 (2) are true? (A) (B) � � � � 1 . 5 1 . 5 − 1 K − 1 = = K (2) . 5 1 (2) − 1 2

The Humble Gaussian Distribution David J.C. MacKay Cavendish Laboratory Cambridge CB3 0HE United Kingdom June 11, 2006 – Draft 1.0 Abstract These are elementary notes on Gaussian distributions, aimed at people who are about to learn about Gaussian processes. I emphasize the following points. What happens to a covariance matrix and inverse covariance matrix when we omit a variable. What it means to have zeros in a covariance matrix. What it means to have zeros in an inverse covariance matrix. How probabilistic models expressed in terms of ‘energies’ relate to Gaussians. Why eigenvectors and eigenvalues don’t have any fundamental status. 1 Introduction Let’s chat about a Gaussian distribution with zero mean, such as − 1 2 y T Ay , P ( y ) = 1 Z e (1) where A = K − 1 is the inverse of the covariance matrix, K , and Z = [det 2 π K ] 1 / 2 . I’m going to emphasize dimensions throughout this note, because I think dimension-consciousness enhances understanding. 1 I’ll write  K 11 K 12 K 13  K = K 12 K 22 K 23 (4)     K 13 K 23 K 33 1 It’s conventional to write the diagonal elements in K as σ 2 i and the offdiagonal elements as σ ij . For example  σ 2  σ 12 σ 13 1 σ 2 K = σ 12 σ 23 (2)   2 σ 2 σ 13 σ 23 3 A confusing convention, since it implies that σ ij has different dimensions from σ i , even if all axes i , j have the same dimensions! Another way of writing an off-diagonal coefficient is K ij = ρ ij σ i σ j , (3) where ρ is the correlation coefficient between i and j . This is a better notation since it’s dimensionally consistent in the way it uses the letter σ . But I will stick with the notation K ij . 2

The definition of the covariance matrix is K ij = � y i y j � (5) so the dimensions of the element K ij are (dimensions of y i ) times (dimensions of y j ). 1.1 Examples Let’s work through a few graphical models. H 1 H 2 y 2 y 1 y 3 y 1 y 3 y 2 Example 1 Example 2 1.1.1 Example 1 Maybe y 2 is the temperature outside some buildings (or rather, the deviation of the outside temperature from its mean), and y 1 is the temperature deviation inside building 1, and y 3 is the temperature inside building 3. This graphical model says that if you know the outside temperature y 2 then y 1 and y 3 are independent. Let’s consider this generative model: y 2 = ν 2 (6) y 1 = w 1 y 2 + ν 1 (7) y 3 = w 3 y 2 + ν 3 , (8) where { ν i } are independent normal variables with variances { σ 2 i } . Then we can write down the entries in the covariance matrix, starting with the diagonal entries K 11 = � y 1 y 1 � = � ( w 1 ν 2 + ν 1 )( w 1 ν 2 + ν 1 ) � = w 2 2 � + 2 w 1 � ν 1 ν 2 � + � ν 1 2 � = w 2 1 σ 2 2 + σ 2 1 � ν 2 (9) 1 K 22 = σ 2 (10) 2 K 33 = w 2 3 σ 2 2 + σ 2 (11) 3 So we can fill in this much: w 2 1 σ 2 2 + σ 2  K 11 K 12 K 13    1 σ 2 K 12 K 22 K 23 K =  = (12)     2    w 2 3 σ 2 2 + σ 2 K 13 K 23 K 33 3 The off diagonal terms are K 12 = � y 1 y 2 � = � ( w 1 ν 2 + ν 1 )( ν 2 ) � = w 1 σ 2 (13) 2 (and similarly for K 23 ) and K 13 = � y 1 y 3 � = � ( w 1 ν 2 + ν 1 )( w 3 ν 2 + ν 3 ) � = w 1 w 3 σ 2 (14) 2 3

So the covariance matrix is: w 2 1 σ 2 2 + σ 2 w 1 σ 2 w 1 w 3 σ 2     K 11 K 12 K 13 1 2 2 σ 2 w 3 σ 2 K = K 12 K 22 K 23  = (15)     2 2    w 2 3 σ 2 2 + σ 2 K 13 K 23 K 33 3 (where the remaining blank elements can be filled in by symmetry). Now let’s think about the inverse covariance matrix. One way to get to it is to write down the joint distribution. P ( y 1 , y 2 , y 3 | H 1 ) = P ( y 2 ) P ( y 1 | y 2 ) P ( y 3 | y 2 ) (16) � 1 � 1 − y 2 − ( y 1 − w 1 y 2 ) 2 − ( y 3 − w 3 y 2 ) 2 � � � � 1 2 = exp exp exp (17) 2 σ 2 2 σ 2 2 σ 2 Z 2 Z 1 Z 3 2 1 3 We can now collect all the terms in y i y j . − y 2 − ( y 1 − w 1 y 2 ) 2 − ( y 3 − w 3 y 2 ) 2 � � 1 2 P ( y 1 , y 2 , y 3 ) = Z ′ exp 2 σ 2 2 σ 2 2 σ 2 2 1 3 � 1 � + w 2 + w 2 � � 1 1 w 1 1 w 3 1 3 − y 2 − y 2 − y 2 = Z ′ exp + 2 y 1 y 2 + 2 y 3 y 2 2 1 3 2 σ 2 2 σ 2 2 σ 2 2 σ 2 2 σ 2 2 σ 2 2 σ 2 2 1 3 1 1 3 3 1 − w 1       y 1 0 σ 2 σ 2     � 1 1 1       + w 2 + w 2 � 1 − 1 − w 1 − w 3       � � 1 3 = Z ′ exp  y 1 y 2 y 3    y 2         σ 2 σ 2 σ 2 σ 2 σ 2 2       1 2 1 3 3       − w 3 1       0 y 3     σ 2 σ 2 3 3 So the inverse covariance matrix is 1 − w 1   0 σ 2 σ 2   � 1 1 1   + w 2 + w 2 � − w 1 − w 3   K − 1 = 1 3     σ 2 σ 2 σ 2 σ 2 σ 2   1 2 1 3 3   − w 3 1   0   σ 2 σ 2 3 3 The first thing I’d like you to notice here is the zeroes. [ K − 1 ] 13 = 0 . The meaning of a zero in an inverse covariance matrix (at location i, j ) is conditional on all the other variables, these two variables i and j are independent . Next, notice that whereas y 1 and y 2 were positively correlated (assuming w 1 > 0), the coefficient [ K − 1 ] 12 is negative. It’s common that a covariance matrix K in which all the elements are non- negative has an inverse that includes some negative elements. So positive off-diagonal terms in the covariance matrix always describe positive correlation; but the off-diagonal terms in the inverse covariance matrix can’t be interpreted that way. The sign of an element ( i, j ) in the inverse covariance matrix does not tell you about the correlation between those two variables. For example, remember: there is a zero at [ K − 1 ] 13 . But that doesn’t mean that variables y 1 and y 3 are uncorrelated. Thanks to their parent y 2 , they are correlated, with covariance w 1 w 3 σ 2 2 . The off-diagonal entry [ K − 1 ] ij in an inverse covariance matrix indicates how y i and y j are correlated if we condition on all the other variables apart from those two: if [ K − 1 ] ij < 0, they are positively correlated, conditioned on the others; if [ K − 1 ] ij > 0, they are negatively correlated. 4

Gaussian Quiz H 1 y 2 y 1 y 3 1 . Assuming that the variables y 1 , y - PDF document

Preamble to The Humble Gaussian Distribution. David MacKay 1 Gaussian Quiz H 1 y 2 y 1 y 3 1 . Assuming that the variables y 1 , y 2 , y 3 in this belief network have a joint Gaussian distribution, which of the following matrices could be

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Endocrinology: top- decile quiz SBA Quiz Quiz Dr Shuaib Siddiqui, MB BChir MRCP FY3 doctor

PBIO 375 Quiz Section Goals of Quiz Section Website Quiz Section Tests Quiz

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

MAP for Gaussian mean and variance Conjugate priors Mean: Gaussian prior Variance:

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

The Gaussian Distribution Chris Williams School of Informatics, University of Edinburgh October

Chapter 9 Gaussian Channel Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei

Simple Ways to Solve Tax Problems: How to Make Money Saving Taxpayers Lives! Eric L. Green,

Experian Payment Strategies London, June 2011 Drivers for change What are the pressures on the

Jonath athan an R R. Marti tin, C CPA Pat Patrick D D. Craig, g, J JD Webinar Outline

Academic Senate Meeting 9-1-20 Reminder We ask that everyone please keep in mind the

Air Pollution and Low Emission Zones The potential impact on the haulage industry Martin Reid

The Tax Cuts and Jobs Act and Wayfair : Their impacts on private equity January 23, 2020

Professional Accounting Conference Carol Paradine, CPA, CA CEO, Canadian Public Accountability

Aspect Ratio Test Slide BUILDING INCLUSIVE COMMUNITIES by Jan Lehnardt at ApacheCon EU 2016 in