Machine Learning Srihari
1
Probability Distributions Sargur N. Srihari 1 Srihari Machine - - PowerPoint PPT Presentation
Srihari Machine Learning Probability Distributions Sargur N. Srihari 1 Srihari Machine Learning Distributions: Landscape Discrete- Binary Bernoulli Binomial Beta Discrete- Multivalued Multinomial Dirichlet Continuous Gaussian
Machine Learning Srihari
1
Machine Learning Srihari
2
Discrete- Binary Discrete- Multivalued Continuous Bernoulli Multinomial Gaussian Angular Von Mises Binomial Beta Dirichlet Gamma Wishart Student’s-t Exponential Uniform
Machine Learning Srihari
3
Discrete- Binary Discrete- Multi-valued Continuous Bernoulli
Single binary variable
Multinomial
One of K values = K-dimensional binary vector
Gaussian Angular Von Mises Binomial
N samples of Bernoulli
Beta
Continuous variable between {0,1]
Dirichlet
K random variables between [0.1]
Gamma
ConjugatePrior of univariate Gaussian precision
Wishart
Conjugate Prior of multivariate Gaussian precision matrix
Student’s-t
Generalization of Gaussian robust to Outliers Infinite mixture of Gaussians
Exponential
Special case of Gamma
Uniform
N=1
Conjugate Prior Conjugate Prior
Large N K=2
Gaussian-Gamma
Conjugate prior of univariate Gaussian Unknown mean and precision
Gaussian-Wishart
Conjugate prior of multi-variate Gaussian Unknown mean and precision matrix
Machine Learning Srihari
4
Machine Learning Srihari
5
p(x=1|µ)=µ
p(x=0|µ)=1-µ
–
Jacob Bernoulli 1654-1705
Machine Learning Srihari
6
Histogram of Binomial for N=10 and µ=0.25
Machine Learning Srihari
7
a=0.1, b=0.1 a=1, b=1 a=2, b=3 a=8, b=4 Beta distribution as function of µ For values of hyperparameters a and b
Machine Learning Srihari
8
– Severely over-fitted for small data sets
– Called conjugacy
Machine Learning Srihari
9
– where l=N-m, which is no of tails – m is no of heads
– Effectively increase value of a by m and b by l – As number of observations increases distribution becomes more peaked
a=2, b=2 N=m=1, with x=1 a=3, b=2 Illustration of
µ1(1-µ)0
Machine Learning Srihari
10
p(x =1| D) = p(x =1,µ | D)dµ
1
= p(x =1|µ)p(µ | D)dµ
1
= = µp(µ | D)dµ
1
= E[µ | D]
Machine Learning Srihari
11
Machine Learning Srihari
12
Machine Learning Srihari
13
Generalized Bernoulli
Machine Learning Srihari
14
which is fraction of N observations for which xk=1
Machine Learning Srihari
15
Machine Learning Srihari
16
Lejeune Dirichlet 1805-1859
Machine Learning Srihari
17
αk=0.1 αk=1 αk=10 Plots of Dirichlet distribution over the simplex for various settings of parameters αk
Machine Learning Srihari
18
Machine Learning Srihari
19