applied machine learning
play

APPLIED MACHINE LEARNING Probability Density Functions Gaussian - PowerPoint PPT Presentation

APPLIED MACHINE LEARNING APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1 APPLIED MACHINE LEARNING Discrete Probabilities Consider two variables x and y taking discrete values over the intervals [1.....N ] and


  1. APPLIED MACHINE LEARNING APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1

  2. APPLIED MACHINE LEARNING Discrete Probabilities Consider two variables x and y taking discrete values over the intervals [1.....N ] and [1.....N ] respectively. x y    : the probability that the variable takes value . P x i x i        0 1, 1,..., , P x i i N x N  x     and 1. P x i  i 1     Idem for , 1,... P y j j N y 2

  3. APPLIED MACHINE LEARNING Discrete Probabilities The joint probability is written p(x,y). The joint probability that variable x takes value i and variable y takes value j is:                  , or P x i y j P x i y j P(x | y) is the conditional probability of observing a value for x given a value for y.     | ( ) P y x P x , P x y       | | P x y P x y     P y P y Bayes' theorem: When x and y are statistically independent: Matlab Exercise I          | ( ), | ( ) and , ( ) ( ). P x y P x P y x P y P x y P x P y 3

  4. APPLIED MACHINE LEARNING Discrete Probabilities The marginal probability that variable x takes value x i is given by: N y      ( ): ( , ) P x x P x i y j x i xy  1 j Drop the x, y for simplicity of notation • To compute the marginal, one needs the joint distribution p(x,y). • Often, one does not know it and one can only estimate it. • If x is a multidimensional variable  the marginal is a joint distribution! 4

  5. APPLIED MACHINE LEARNING Joint Distribution and Curse of Dimensionality The joint distribution is far richer than the marginals. The marginals of N variables taking K values corresponds to N(K-1) probabilities. The joint distribution corresponds to ~N K probabilities. Pros of computing the joint distribution: Provides statistical dependencies across all variables and the marginal distributions Cons: Computational costs grow exponentially with number of dimensions (statistical power: 10 samples to estimate each parameter of a model)  Compute solely the conditional if you care only about dependencies across variables (this will be relevant for lecture on non-linear regression methods) 5

  6. APPLIED MACHINE LEARNING Probability Distributions, Density Functions p(x) a continuous function is the probability density function or probability distribution function (PDF) (sometimes also called probability distribution or simply density) of variable x .    ( ) 0, p x x    ( ) 1 p x dx  6

  7. APPLIED MACHINE LEARNING Probability Distributions, Density Functions The pdf is not bounded by 1. It can grow unbounded, depending on the value taken by x. p(x) x 7

  8. APPLIED MACHINE LEARNING PDF equivalency with Discrete Probability The cumulative distribution function (or simply distribution function) of X is:       * * D x P x x x   *  x   * ( ) , D x p x dx x x  p ( x ) d x ~ probability of x to fall within an infinitesimal interval [ x , x + d x ] 8

  9. APPLIED MACHINE LEARNING PDF equivalency with Discrete Probability Uniform distribution on x p(x) Probability that x takes a value x in the subinterval [a,b] is given by:  b     ( ) : ( ) ( ) P x b D x b p x dx x          * ( ) ( ) ( ) D x P a x b D x b D x a x x x  b     ( ) ( ) 1 P a x b p x dx a * 9 x

  10. APPLIED MACHINE LEARNING Expectation The expectation of the random variable x with probability P(x) (in the discrete case) and pdf p(x) (in the continuous case), also called the expected value or mean, is the mean of the observed value of x weighted by p(x). If X is the set of observations of x, then:       When x takes discrete values: ( ) E x xP x  x X         For continuous distributions: ( ) E x x p x dx X 10

  11. APPLIED MACHINE LEARNING Variance  , the variance of a distribution measures the amount of spread of the 2 distribution around its mean:         2 2           2 2 ( ) Var x E x E x E x   is the standard deviation of x. 11

  12. APPLIED MACHINE LEARNING Parametric PDF The uni-dimensional Gaussian or Normal distribution is a distribution with pdf given by:     2   x   1      2  2   , μ:mean, σ :variance 2 p x e   2 The Gaussian function is entirely determined by its mean and variance. For this reason, it is referred to as a parametric distribution. 12 Illustrations from Wikipedia

  13. APPLIED MACHINE LEARNING Mean and Variance in PDF ~68% of the data are comprised between +/ 1 sigma ~96% of the data are comprised between +/ 2 sigma-s ~99% of the data are comprised between +/ 3 sigma-s This is no longer true for arbitrary pdf-s! 13 Illustrations from Wikipedia

  14. APPLIED MACHINE LEARNING Mean and Variance in PDF 0.7 0.6 1sigma=0.68 0.5 f=1/3(f1+f2+f3) 0.4 0.3 0.2 0.1 Expectation: 0 -4 -3 -2 -1 0 1 2 3 4 x Resulting distribution when superposing the 3 Gaussians distributions 3 Gaussian distributions. For other pdf than the Gaussian distribution, the variance represents a notion of dispersion around the expected value. Matlab Demo I 14

  15. APPLIED MACHINE LEARNING Multi-dimensional Gaussian Function The uni-dimensional Gaussian or Normal distribution is a distribution with pdf given by:     2   x    1      2    2   , μ:mean, σ:variance ; , p x e   2 The multi-dimensional Gaussian or Normal distribution has a pdf given by:   1        T    1 1  x x         2 ; , p x e 1 N      2 2 2 if x is N-dimensional, then  μ is a dimensional mean vector N   is a covariance matrix N N 15

  16. APPLIED MACHINE LEARNING 2-dimensional Gaussian Pdf   , p x x 1 2 x 2 x x 2 1 x 1   1        T    1 1  x x         2 ; , p x e 1   N       Isolines: p x cst 2 2 2 if x is N-dimensional, then  μ is a dimensional mean vector N   is a covariance matrix N N 16

  17. APPLIED MACHINE LEARNING Modeling Data with a Gaussian Function    1... i M  i Construct covariance matrix from (centered) set of datapoints : X x 1   T XX M   1        T    1 1  x x         2 ; , p x e 1 N      2 2 2 if x is N-dimensional, then  μ is a dimensional mean vector N   is a covariance matrix N N 17

  18. APPLIED MACHINE LEARNING Modeling Data with a Gaussian Function    1... i M  i Construct covariance matrix from (centered) set of datapoints : X x 1   T XX M  is square and symmetric. It can be decomposed using the eigenvalue decomposition.    T , V V    0    1 : matrix of eigenvectors, : diagonal matrix composed of eigenvalues V   .......     0   N For the 1-std ellipse, the axes' lengths are 1 st eigenvector equal to:    0     1 T x and , with   . V V 2  1 2  0  2nd eigenvector 2 Each isoline corresponds to a scaling of the 1std ellipse. x 18 1

  19. APPLIED MACHINE LEARNING Fitting a single Gauss function and PCA PCA Identifies a suitable representation of a multivariate data set by decorrelating the dataset.       1 1 2   When projected onto e and e , the set of T X   2 ~ ; , p e X N   2 2 2   datapoints appears to follow two uncorrelated Normal distributions. 2 e 1 st eigenvector       1   T X   1 ~ ; , p e X N   2 1 1   x 2 2nd eigenvector 1 e x 19 1

  20. APPLIED MACHINE LEARNING Marginal, Conditional in Pdf Consider two random variables x 1 and x 2 with joint distribution p(x 1 , x 2 ), then the marginal probability of x 1 given x 1 is:     ( , ) p x p x x dx 1 1 2 2 The conditional probability is given by:   ( | ) ( , ) p x x p x p x x        1 2 2 1 2 | | p x x p x x     2 1 2 1 p x p x 1 1 20

  21. APPLIED MACHINE LEARNING Marginal, Conditional Pdf of Gauss Functions The conditional and marginal pdf of a multi-dimensional Gauss function are all Gauss functions! joint density of , x x 1 2   , p x x 1 2 marginal density of x    , 2 1 2 2 conditional density of x 2 x  given 0. 1 Matlab Exercise II 1  1 x  0 marginal density of x 1 21 Illustrations from Wikipedia 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend