人工智能引论 2018 罗智凌
人工智能引论 (三)
- Intro. on Artificial Intelligence
( ) Intro. on Artificial Intelligence from the perspective of - - PowerPoint PPT Presentation
2018 ( ) Intro. on Artificial Intelligence from the perspective of probability theory luozhiling@zju.edu.cn College of Computer Science Zhejiang University http://www.bruceluo.net
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
– MLE – MAP
人工智能引论 2018 罗智凌
P5 P30 G Prob Y Y Y 0.173 Y Y N 0.075 Y N Y 0.116 Y N N 0.121 N Y Y 0.075 N Y N 0.127 N N Y 0.179 N N N 0.133
𝑄(𝑄5, 𝑄30, 𝐻, 𝛽, 𝛾, 𝛿) 𝑄 𝐻 𝑄5, 𝑄30, 𝛽, 𝛾, 𝛿) 𝑄 𝐻, 𝑄5, 𝑄30 𝛽, 𝛾, 𝛿)
Generative model Discriminative model
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
通过随机梯度下降、爬山等已有方法工具(MATLAB, Python)求解
人工智能引论 2018 罗智凌
The prior on parameters
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
– Gradient Descent (GD) – EM algorithm – Sampling algorithms
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
– Big learning rate for low-frequent param, small for high-frequent one.
– Adagrad的改进,用local的梯度平方和替换了全局的梯度平方
– 与Adagrad相似,增加了梯度的二阶矩,更稳定
人工智能引论 2018 罗智凌
Given the statistical model which generates a set X of observed data, a set of unobserved latent data or missing values Z , and a vector of unknown parameters θ , along with a likelihood function L(θ;X,Z)=p(X,Z|θ) , the maximum likelihood estimate (MLE) of the unknown parameters is determined by the marginal likelihood of the The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying these two steps:
function, with respect to the conditional distribution of Z given X under the current estimate of the parameters θ(t) :
人工智能引论 2018 罗智凌
parameters 𝜈.
example ∅ is normal distribution and its expectation f is also in the normal distribution.
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
– Markov Model – Markov Network – Neural Network
人工智能引论 2018 罗智凌
variables X1, X2, X3, ... with the Markov property, namely that the probability of moving to the next state depends only on the present state and not on the previous states
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
Mixtures of Gaussians
人工智能引论 2018 罗智凌
Latent variable
responsibility
人工智能引论 2018 罗智凌
responsibility
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
– Collapses onto a specific data point
– Total K! equivalent solutions
– The derivatives of the log likelihood are complex.
人工智能引论 2018 罗智凌
Weighting factor
Each iteration will increase the log likelihood function.
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
Illustration of the EM algorithm using the Old Faithful set as used for the illustration of the K-means algorithm
人工智能引论 2018 罗智凌