probability distributions
play

Probability Distributions Sargur N. Srihari 1 Srihari Machine - PowerPoint PPT Presentation

Srihari Machine Learning Probability Distributions Sargur N. Srihari 1 Srihari Machine Learning Distributions: Landscape Discrete- Binary Bernoulli Binomial Beta Discrete- Multivalued Multinomial Dirichlet Continuous Gaussian


  1. Srihari Machine Learning Probability Distributions Sargur N. Srihari 1

  2. Srihari Machine Learning Distributions: Landscape Discrete- Binary Bernoulli Binomial Beta Discrete- Multivalued Multinomial Dirichlet Continuous Gaussian Student’s-t Gamma Wishart Exponential Angular Von Mises Uniform 2

  3. Srihari Machine Learning Distributions: Relationships Discrete- Conjugate Binary N=1 Beta Prior Binomial Bernoulli Continuous variable N samples of Bernoulli Single binary variable between {0,1] K=2 Discrete- Multinomial Conjugate Prior Large Dirichlet Multi-valued One of K values = N K -dimensional K random variables binary vector between [0.1] Continuous Student’s-t Gamma Wishart Exponential Generalization of ConjugatePrior of univariate Conjugate Prior of multivariate Special case of Gamma Gaussian robust to Gaussian precision Gaussian precision matrix Gaussian Outliers Infinite mixture of Gaussian-Gamma Gaussian-Wishart Gaussians Conjugate prior of univariate Gaussian Conjugate prior of multi-variate Gaussian Unknown mean and precision Unknown mean and precision matrix Angular Von Mises Uniform 3

  4. Srihari Machine Learning Binary Variables Bernoulli, Binomial and Beta 4

  5. Srihari Machine Learning Bernoulli Distribution • Expresses distribution of Single binary-valued random variable x ε {0,1} • Probability of x=1 is denoted by parameter µ , i.e., p(x=1| µ )= µ • Therefore p(x=0| µ )=1- µ • Probability distribution has the form Bern(x| µ )= µ x (1- µ ) 1-x • Mean is shown to be E[x]= µ • Variance is Var[x]= µ (1- µ ) Jacob Bernoulli • Likelihood of n observations independently drawn from p(x| µ ) is 1654-1705 • Log-likelihood is • Maximum likelihood estimator – obtained by setting derivative of ln p(D| µ ) wrt µ equal to zero is • If no of observations of x=1 is m then µ ML =m/N 5

  6. Srihari Machine Learning Binomial Distribution • Related to Bernoulli distribution • Expresses Distribution of m – No of observations for which x=1 • It is proportional to Bern(x | µ ) Histogram of Binomial for • Add up all ways of obtaining heads N=10 and µ =0.25 • Mean and Variance are 6

  7. Srihari Machine Learning Beta Distribution • Beta distribution a=0.1, b=0.1 a=1, b=1 • Where the Gamma function is defined as a=2, b=3 a=8, b=4 • a and b are hyperparameters that control distribution of parameter µ • Mean and Variance Beta distribution as function of µ For values of hyperparameters a and b 7

  8. Srihari Machine Learning Bayesian Inference with Beta • MLE of µ in Bernoulli is fraction of observations with x=1 – Severely over-fitted for small data sets • Likelihood function takes products of factors of the form µ x (1- µ ) (1-x) • If prior distribution of µ is chosen to be proportional to powers of µ and 1- µ , posterior will have same functional form as the prior – Called conjugacy • Beta has form suitable for a prior distribution of p( µ ) 8

  9. Srihari Machine Learning Bayesian Inference with Beta Illustration of one step in process • Posterior obtained by multiplying beta a=2, b=2 prior with binomial likelihood yields N=m=1, with x=1 – where l=N-m, which is no of tails – m is no of heads µ 1 ( 1 - µ ) 0 • It is another beta distribution a=3, b=2 – Effectively increase value of a by m and b by l – As number of observations increases distribution becomes more peaked 9

  10. Srihari Machine Learning Predicting next trial outcome • Need predictive distribution of x given observed D – From sum and products rule 1 1 ∫ ∫ p ( x = 1| D ) = p ( x = 1, µ | D ) d µ p ( x = 1| µ ) p ( µ | D ) d µ = = 0 0 1 = ∫ µ p ( µ | D ) d µ = E [ µ | D ] 0 • Expected value of the posterior distribution can be shown to be – Which is fraction of observations (both fictitious and real) that correspond to x=1 • Maximum likelihood and Bayesian results agree in the limit of infinite observations – On average uncertainty (variance) decreases with observed data 10

  11. Srihari Machine Learning Summary • Single Binary variable distribution is represented by Bernoulli • Binomial is related to Bernoulli – Expresses distribution of number of occurrences of either 1 or 0 in N trials • Beta distribution is a conjugate prior for Bernoulli – Both have the same functional form 11

  12. Srihari Machine Learning Multinomial Variables Generalized Bernoulli and Dirichlet 12

  13. Srihari Machine Learning Generalization of Bernoulli • Discrete variable that takes one of K values (instead of 2 ) • Represent as 1 of K scheme – Represent x as a K -dimensional vector – If x=3 then we represent it as x =(0,0,1,0,0,0) T – Such vectors satisfy • If probability of x k =1 is denoted µ k then distribution of x is given by Generalized Bernoulli 13

  14. Srihari Machine Learning Likelihood Function • Given a set of D of N independent observations x 1 ,..x N • The likelihood function has the form • Where m k = Σ n x nk is the number of observations of x k =1 • The maximum likelihood solution (obtained by log-likelihood and derivative wrt zero) is which is fraction of N observations for which x k =1 14

  15. Srihari Machine Learning Generalized Binomial Distribution • Multinomial distribution • Where the normalization coefficient is the no of ways of partitioning N objects into K groups of size • Given by 15

  16. Srihari Machine Learning Dirichlet Distribution Lejeune Dirichlet 1805-1859 • Family of prior distributions for parameters µ k of multinomial distribution • By inspection of multinomial, form of conjugate prior is • Normalized form of Dirichlet distribution 16

  17. Srihari Machine Learning Dirichlet over 3 variables α k =0.1 • Due to summation constraint – Distribution over Plots of Dirichlet α k =1 space of { µ k } is distribution over the simplex for various confined to the settings of parameters simplex of α k dimensionality K-1 α k =10 – For K=3 17

  18. Srihari Machine Learning Dirichlet Posterior Distribution • Multiplying prior by likelihood • Which has the form of the Dirichlet distribution 18

  19. Srihari Machine Learning Summary • Multinomial is a generalization of Bernoulli – Variable takes on one of K values instead of 2 • Conjugate prior of Multinomial is Dirichlet distribution 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend