The Gaussian Distribution Chris Williams School of Informatics, - PowerPoint PPT Presentation

The Gaussian Distribution Chris Williams School of Informatics, University of Edinburgh October 2007 1 / 19

Overview Probability density functions Univariate Gaussian Multivariate Gaussian Mahalanobis distance Properties of Gaussian distributions Graphical Gaussian models Read: Bishop sec 2.3 (to p 93) 2 / 19

Continuous distributions Probability density function (pdf) for a continuous random variable X � b P ( a ≤ X ≤ b ) = p ( x ) dx a therefore P ( x ≤ X ≤ x + δ x ) ≃ p ( x ) δ x Example : Gaussian distribution � ( x − µ ) 2 1 � p ( x ) = ( 2 πσ 2 ) 1 / 2 exp − 2 σ 2 shorthand notation X ∼ N ( µ, σ 2 ) Standard normal (or Gaussian) distribution Z ∼ N ( 0 , 1 ) Normalization � ∞ p ( x ) dx = 1 −∞ 3 / 19

0.4 0.3 0.2 0.1 0 −4 −2 0 2 4 Cumulative distribution function � z p ( z ′ ) dz ′ Φ( z ) = P ( Z ≤ z ) = −∞ 4 / 19

Expectation � E [ g ( X )] = g ( x ) p ( x ) dx mean, E [ X ] Variance E [( X − µ ) 2 ] For a Gaussian, mean = µ , variance = σ 2 Shorthand: x ∼ N ( µ, σ 2 ) 5 / 19

Bivariate Gaussian I Let X 1 ∼ N ( µ 1 , σ 2 1 ) and X 2 ∼ N ( µ 2 , σ 2 2 ) If X 1 and X 2 are independent � ( x 1 − µ 1 ) 2 + ( x 2 − µ 2 ) 2 1 2 ) 1 / 2 exp − 1 � p ( x 1 , x 2 ) = 2 π ( σ 2 1 σ 2 σ 2 σ 2 2 1 2 � � � � � σ 2 � x 1 µ 1 0 1 Let x = , µ = , Σ = σ 2 x 2 µ 2 0 2 2 π | Σ | 1 / 2 exp − 1 1 � � ( x − µ ) T Σ − 1 ( x − µ ) p ( x ) = 2 6 / 19

1 0.8 0.6 0.4 0.2 0 2 1 2 1 0 0 −1 −1 −2 −2 7 / 19

Bivariate Gaussian II Covariance Σ is the covariance matrix Σ = E [( x − µ )( x − µ ) T ] Σ ij = E [( x i − µ i )( x j − µ j )] Example: plot of weight vs height for a population 8 / 19

Multivariate Gaussian � P ( x ∈ R ) = R p ( x ) d x Multivariate Gaussian � � 1 − 1 2 ( x − µ ) T Σ − 1 ( x − µ ) p ( x ) = ( 2 π ) d / 2 | Σ | 1 / 2 exp Σ is the covariance matrix Σ = E [( x − µ )( x − µ ) T ] Σ ij = E [( x i − µ i )( x j − µ j )] Σ is symmetric Shorthand x ∼ N ( µ , Σ) For p ( x ) to be a density, Σ must be positive definite Σ has d ( d + 1 ) / 2 parameters, the mean has a further d 9 / 19

Mahalanobis Distance d 2 Σ ( x i , x j ) = ( x i − x j ) T Σ − 1 ( x i − x j ) d 2 Σ ( x i , x j ) is called the Mahalanobis distance between x i and x j If Σ is diagonal, the contours of d 2 Σ are axis-aligned ellipsoids If Σ is not diagonal, the contours of d 2 Σ are rotated ellipsoids Σ = U Λ U T where Λ is diagonal and U is a rotation matrix Σ is positive definite ⇒ entries in Λ are positive 10 / 19

Parameterization of the covariance matrix Fully general Σ = ⇒ variables are correlated Spherical or isotropic. Σ = σ 2 I . Variables are independent Diagonal [Σ] ij = δ ij σ 2 i Variables are independent Rank-constrained: Σ = WW T + Ψ , with W being a d × q matrix with q < d − 1 and Ψ diagonal. This is the factor analysis model. If Ψ = σ 2 I , then with have the probabilistic principal components analysis (PPCA) model 11 / 19

Transformations of Gaussian variables Linear transformations of Gaussian RVs are Gaussian X ∼ N ( µ x , Σ) Y = A X + b Y ∼ N ( A µ x + b , A Σ A T ) Sums of Gaussian RVs are Gaussian Y = X 1 + X 2 E [ Y ] = E [ X 1 ] + E [ X 2 ] var [ Y ] = var [ X 1 ] + var [ X 2 ] + 2 covar [ X 1 , X 2 ] if X 1 and X 2 are independent var [ Y ] = var [ X 1 ] + var [ X 2 ] 12 / 19

Properties of the Gaussian distribution Gaussian has relatively simple analytical properties Central limit theorem. Sum (or mean) of M independent random variables is distributed normally as M → ∞ (subject to a few general conditions) Diagonalization of covariance matrix = ⇒ rotated variables are independent All marginal and conditional densities of a Gaussian are Gaussian The Gaussian is the distribution that maximizes the entropy � H = − p ( x ) log p ( x ) d x for fixed mean and covariance 13 / 19

Graphical Gaussian Models Example: x y z Let X denote pulse rate Let Y denote measurement taken by machine 1, and Z denote measurement taken by machine 2 14 / 19

Model X ∼ N ( µ x , v x ) Y = µ y + w y ( X − µ x ) + N y Z = µ z + w z ( X − µ x ) + N z noise N y ∼ N ( 0 , v N y ) , N z ∼ N ( 0 , v N z ) , independent ( X , Y , Z ) is jointly Gaussian; can do inference for X given Y = y and Z = z 15 / 19

As before P ( x , y , z ) = P ( x ) P ( y | x ) P ( z | x ) Show that   µ x µ = µ y   µ z  v x w y v x w z v x  w 2 y v x + v N Σ = w y v x w y w z v x  y  w 2 z v x + v N w z v x w y w z v x z 16 / 19

Inference in Gaussian models Partition variables into two groups, X 1 and X 2 � µ 1 � µ = µ 2 � � Σ 11 Σ 12 Σ = Σ 21 Σ 22 µ c 1 | 2 = µ 1 + Σ 12 Σ − 1 22 ( x 2 − µ 2 ) Σ c 1 | 2 = Σ 11 − Σ 12 Σ − 1 22 Σ 21 For proof see §2.3.1 of Bishop (2006) (not examinable) Formation of joint Gaussian is analogous to formation of joint probability table for discrete RVs. Propagation schemes are also possible for Gaussian RVs 17 / 19

Example Inference Problem X Y Y = 2 X + 8 + N y Assume X ∼ N ( 0 , 1 /α ) , so w y = 2, µ y = 8, and N y ∼ N ( 0 , 1 ) Show that 2 µ x | y = 4 + α ( y − 8 ) 1 var ( x | y ) = 4 + α 18 / 19

Hybrid (discrete + continuous) networks Could discretize continuous variables, but this is ugly, and gives large CPTs Better to use parametric families, e.g. Gaussian Works easily when continuous nodes are children of discrete nodes; we then obtain a conditional Gaussian model 19 / 19

The Gaussian Distribution Chris Williams School of Informatics, - PowerPoint PPT Presentation

The Gaussian Distribution Chris Williams School of Informatics, University of Edinburgh October 2007 1 / 19 Overview Probability density functions Univariate Gaussian Multivariate Gaussian Mahalanobis distance Properties of Gaussian

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

MAP for Gaussian mean and variance Conjugate priors Mean: Gaussian prior Variance:

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Gaussian Quiz H 1 y 2 y 1 y 3 1 . Assuming that the variables y 1 , y 2 , y 3 in this belief

CS70: Lecture 32. Normal (Gaussian) Distribution. For any and , a normal (aka Gaussian )

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Chapter 9 Gaussian Channel Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei

Synergy between Proteasome Inhibitors and IMiDs for the treatment of Multiple Myeloma Pr Philippe

Example-Based Automatic Phonetic Transcription Language Resources and Evaluation Conference 2010

Leveraging a Corpus of Natural Language Descriptions for Program Similarity Meital Zilberstein

Training Global Linear Models for Chinese Word Segmentation Dong Song and Anoop Sarkar Natural

Information Theoretic Metric Learning Instructor: Sham Kakade 1 Metric Learning In k -nearest

Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero

The ARCHES cross-correlation tool cois-Xavier Pineau 1 Fran 1 Observatoire Astronomique de

Human Motion Tracking by Registering an Articulated Surface to 3-D Points and Normals 1 Radu