the gaussian distribution
play

The Gaussian Distribution Chris Williams School of Informatics, - PowerPoint PPT Presentation

The Gaussian Distribution Chris Williams School of Informatics, University of Edinburgh October 2007 1 / 19 Overview Probability density functions Univariate Gaussian Multivariate Gaussian Mahalanobis distance Properties of Gaussian


  1. The Gaussian Distribution Chris Williams School of Informatics, University of Edinburgh October 2007 1 / 19

  2. Overview Probability density functions Univariate Gaussian Multivariate Gaussian Mahalanobis distance Properties of Gaussian distributions Graphical Gaussian models Read: Bishop sec 2.3 (to p 93) 2 / 19

  3. Continuous distributions Probability density function (pdf) for a continuous random variable X � b P ( a ≤ X ≤ b ) = p ( x ) dx a therefore P ( x ≤ X ≤ x + δ x ) ≃ p ( x ) δ x Example : Gaussian distribution � ( x − µ ) 2 1 � p ( x ) = ( 2 πσ 2 ) 1 / 2 exp − 2 σ 2 shorthand notation X ∼ N ( µ, σ 2 ) Standard normal (or Gaussian) distribution Z ∼ N ( 0 , 1 ) Normalization � ∞ p ( x ) dx = 1 −∞ 3 / 19

  4. 0.4 0.3 0.2 0.1 0 −4 −2 0 2 4 Cumulative distribution function � z p ( z ′ ) dz ′ Φ( z ) = P ( Z ≤ z ) = −∞ 4 / 19

  5. Expectation � E [ g ( X )] = g ( x ) p ( x ) dx mean, E [ X ] Variance E [( X − µ ) 2 ] For a Gaussian, mean = µ , variance = σ 2 Shorthand: x ∼ N ( µ, σ 2 ) 5 / 19

  6. Bivariate Gaussian I Let X 1 ∼ N ( µ 1 , σ 2 1 ) and X 2 ∼ N ( µ 2 , σ 2 2 ) If X 1 and X 2 are independent � ( x 1 − µ 1 ) 2 + ( x 2 − µ 2 ) 2 1 2 ) 1 / 2 exp − 1 � p ( x 1 , x 2 ) = 2 π ( σ 2 1 σ 2 σ 2 σ 2 2 1 2 � � � � � σ 2 � x 1 µ 1 0 1 Let x = , µ = , Σ = σ 2 x 2 µ 2 0 2 2 π | Σ | 1 / 2 exp − 1 1 � � ( x − µ ) T Σ − 1 ( x − µ ) p ( x ) = 2 6 / 19

  7. 1 0.8 0.6 0.4 0.2 0 2 1 2 1 0 0 −1 −1 −2 −2 7 / 19

  8. Bivariate Gaussian II Covariance Σ is the covariance matrix Σ = E [( x − µ )( x − µ ) T ] Σ ij = E [( x i − µ i )( x j − µ j )] Example: plot of weight vs height for a population 8 / 19

  9. Multivariate Gaussian � P ( x ∈ R ) = R p ( x ) d x Multivariate Gaussian � � 1 − 1 2 ( x − µ ) T Σ − 1 ( x − µ ) p ( x ) = ( 2 π ) d / 2 | Σ | 1 / 2 exp Σ is the covariance matrix Σ = E [( x − µ )( x − µ ) T ] Σ ij = E [( x i − µ i )( x j − µ j )] Σ is symmetric Shorthand x ∼ N ( µ , Σ) For p ( x ) to be a density, Σ must be positive definite Σ has d ( d + 1 ) / 2 parameters, the mean has a further d 9 / 19

  10. Mahalanobis Distance d 2 Σ ( x i , x j ) = ( x i − x j ) T Σ − 1 ( x i − x j ) d 2 Σ ( x i , x j ) is called the Mahalanobis distance between x i and x j If Σ is diagonal, the contours of d 2 Σ are axis-aligned ellipsoids If Σ is not diagonal, the contours of d 2 Σ are rotated ellipsoids Σ = U Λ U T where Λ is diagonal and U is a rotation matrix Σ is positive definite ⇒ entries in Λ are positive 10 / 19

  11. Parameterization of the covariance matrix Fully general Σ = ⇒ variables are correlated Spherical or isotropic. Σ = σ 2 I . Variables are independent Diagonal [Σ] ij = δ ij σ 2 i Variables are independent Rank-constrained: Σ = WW T + Ψ , with W being a d × q matrix with q < d − 1 and Ψ diagonal. This is the factor analysis model. If Ψ = σ 2 I , then with have the probabilistic principal components analysis (PPCA) model 11 / 19

  12. Transformations of Gaussian variables Linear transformations of Gaussian RVs are Gaussian X ∼ N ( µ x , Σ) Y = A X + b Y ∼ N ( A µ x + b , A Σ A T ) Sums of Gaussian RVs are Gaussian Y = X 1 + X 2 E [ Y ] = E [ X 1 ] + E [ X 2 ] var [ Y ] = var [ X 1 ] + var [ X 2 ] + 2 covar [ X 1 , X 2 ] if X 1 and X 2 are independent var [ Y ] = var [ X 1 ] + var [ X 2 ] 12 / 19

  13. Properties of the Gaussian distribution Gaussian has relatively simple analytical properties Central limit theorem. Sum (or mean) of M independent random variables is distributed normally as M → ∞ (subject to a few general conditions) Diagonalization of covariance matrix = ⇒ rotated variables are independent All marginal and conditional densities of a Gaussian are Gaussian The Gaussian is the distribution that maximizes the entropy � H = − p ( x ) log p ( x ) d x for fixed mean and covariance 13 / 19

  14. Graphical Gaussian Models Example: x y z Let X denote pulse rate Let Y denote measurement taken by machine 1, and Z denote measurement taken by machine 2 14 / 19

  15. Model X ∼ N ( µ x , v x ) Y = µ y + w y ( X − µ x ) + N y Z = µ z + w z ( X − µ x ) + N z noise N y ∼ N ( 0 , v N y ) , N z ∼ N ( 0 , v N z ) , independent ( X , Y , Z ) is jointly Gaussian; can do inference for X given Y = y and Z = z 15 / 19

  16. As before P ( x , y , z ) = P ( x ) P ( y | x ) P ( z | x ) Show that   µ x µ = µ y   µ z  v x w y v x w z v x  w 2 y v x + v N Σ = w y v x w y w z v x  y  w 2 z v x + v N w z v x w y w z v x z 16 / 19

  17. Inference in Gaussian models Partition variables into two groups, X 1 and X 2 � µ 1 � µ = µ 2 � � Σ 11 Σ 12 Σ = Σ 21 Σ 22 µ c 1 | 2 = µ 1 + Σ 12 Σ − 1 22 ( x 2 − µ 2 ) Σ c 1 | 2 = Σ 11 − Σ 12 Σ − 1 22 Σ 21 For proof see §2.3.1 of Bishop (2006) (not examinable) Formation of joint Gaussian is analogous to formation of joint probability table for discrete RVs. Propagation schemes are also possible for Gaussian RVs 17 / 19

  18. Example Inference Problem X Y Y = 2 X + 8 + N y Assume X ∼ N ( 0 , 1 /α ) , so w y = 2, µ y = 8, and N y ∼ N ( 0 , 1 ) Show that 2 µ x | y = 4 + α ( y − 8 ) 1 var ( x | y ) = 4 + α 18 / 19

  19. Hybrid (discrete + continuous) networks Could discretize continuous variables, but this is ugly, and gives large CPTs Better to use parametric families, e.g. Gaussian Works easily when continuous nodes are children of discrete nodes; we then obtain a conditional Gaussian model 19 / 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend