probabilistic graphical models probabilistic graphical
play

Probabilistic Graphical Models Probabilistic Graphical Models - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected models Siamak Ravanbakhsh Fall 2019 Learning objectives Learning objectives the form of likelihood for undirected models why is it difficult to


  1. Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected models Siamak Ravanbakhsh Fall 2019

  2. Learning objectives Learning objectives the form of likelihood for undirected models why is it difficult to optimize? conditional likelihood in undirected models different approximations for parameter learning MAP inference and regularization pseudo likelihood pseudo moment-matching contrastive learning

  3. Likelihood in MRFs Likelihood in MRFs example A probability dist. I ( A = 1, B = 1) 1 I ( A = I ( B = B p ( A , B , C ; θ ) = exp( θ 1, B = 1) + θ 1, C = 1)) 1 2 Z I ( B = 1, C = 1) C

  4. Likelihood in MRFs Likelihood in MRFs example A probability dist. I ( A = 1, B = 1) 1 I ( A = I ( B = B p ( A , B , C ; θ ) = exp( θ 1, B = 1) + θ 1, C = 1)) 1 2 Z I ( B = 1, C = 1) observations ∣ D ∣ = 100 C E [ I ( A = 1, B = 1)] = .4, E [ I ( B = 1, C = 1)] = .4 D D

  5. Likelihood in MRFs Likelihood in MRFs example A probability dist. I ( A = 1, B = 1) 1 I ( A = I ( B = B p ( A , B , C ; θ ) = exp( θ 1, B = 1) + θ 1, C = 1)) 1 2 Z I ( B = 1, C = 1) observations ∣ D ∣ = 100 C E [ I ( A = 1, B = 1)] = .4, E [ I ( B = 1, C = 1)] = .4 D D log-likelihood: log p ( D ; θ ) = I ( a = I ( b = 1, b = 1) + θ 1, c = 1) − 100 log Z ( θ ) ∑ a , b , c ∈ D θ 1 2 = 40 θ + 40 θ − 100 log Z ( θ ) 1 2

  6. Likelihood in MRFs Likelihood in MRFs example A probability dist. I ( A = 1, B = 1) 1 I ( A = I ( B = B p ( A , B , C ; θ ) = exp( θ 1, B = 1) + θ 1, C = 1)) 1 2 Z I ( B = 1, C = 1) observations ∣ D ∣ = 100 C E [ I ( A = 1, B = 1)] = .4, E [ I ( B = 1, C = 1)] = .4 D D log-likelihood: log p ( D ; θ ) = I ( a = I ( b = 1, b = 1) + θ 1, c = 1) − 100 log Z ( θ ) ∑ a , b , c ∈ D θ 1 2 = 40 θ + 40 θ − 100 log Z ( θ ) 1 2 because of the partition function the likelihood does not decompose log-likelihood function θ 2 θ 1

  7. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) sufficient statistics

  8. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) sufficient statistics log-likelihood of ℓ( D , θ ) = log p ( D ; θ ) = ⟨ θ , ϕ ( x )⟩ − ∣ D ∣ log Z ( θ ) ∑ x ∈ D D

  9. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) sufficient statistics log-likelihood of ℓ( D , θ ) = log p ( D ; θ ) = ⟨ θ , ϕ ( x )⟩ − ∣ D ∣ log Z ( θ ) ∑ x ∈ D D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D expected sufficient statistics μ D

  10. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) sufficient statistics log-likelihood of ℓ( D , θ ) = log p ( D ; θ ) = ⟨ θ , ϕ ( x )⟩ − ∣ D ∣ log Z ( θ ) ∑ x ∈ D D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D example expected sufficient statistics μ D expected sufficient statistics params. E [ I ( X θ = 0, X = 0)] = P ( X = 0, X = 0) 1,2,0,0 1 2 1 2 D E [ I ( X = 1, X = 0)] = P ( X = 1, X = 0) θ 1,2,1,0 1 2 1 2 D E [ I ( X θ = 0, X = 1)] = P ( X = 0, X = 1) 1,2,0,1 1 2 1 2 D θ E [ I ( X = 1, X = 1)] = P ( X = 1, X = 1) 1,2,1,1 1 2 1 2 D image: Michael Jordan's draft

  11. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) sufficient statistics log-likelihood of ℓ( D , θ ) = log p ( D ; θ ) = ⟨ θ , ϕ ( x )⟩ − ∣ D ∣ log Z ( θ ) ∑ x ∈ D D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D expected sufficient statistics μ D has interesting properties log Z ( θ ) ∂ ∑ x exp(⟨ θ , ϕ ( x )⟩) so E ∂ 1 E ∇ log Z ( θ ) = [ ϕ ( x )] ∂ θ log Z ( θ ) = = ∑ x ( x ) exp(⟨ θ , ϕ ( x )⟩) = [ ϕ ( x )] ϕ i θ θ i p i ∂ θ Z ( θ ) Z ( θ ) i

  12. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) sufficient statistics log-likelihood of ℓ( D , θ ) = log p ( D ; θ ) = ⟨ θ , ϕ ( x )⟩ − ∣ D ∣ log Z ( θ ) ∑ x ∈ D D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D expected sufficient statistics μ D has interesting properties log Z ( θ ) ∂ ∑ x exp(⟨ θ , ϕ ( x )⟩) so E ∂ 1 E ∇ log Z ( θ ) = [ ϕ ( x )] ∂ θ log Z ( θ ) = = ∑ x ( x ) exp(⟨ θ , ϕ ( x )⟩) = [ ϕ ( x )] ϕ i θ θ i p i ∂ θ Z ( θ ) Z ( θ ) i ∂ 2 E [ ϕ E [ ϕ ( x )] E [ ϕ log Z ( θ ) = ( x ) ϕ ( x )] − ( x )] = Cov ( ϕ , ϕ ) i j i j i j ∂ θ ∂ θ i j so the Hessian matrix is positive definite is convex log Z ( θ )

  13. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) log-likelihood of D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D linear in θ convex

  14. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) log-likelihood of D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D linear in θ convex concave

  15. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) log-likelihood of D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D linear in θ convex concave should be easy to maximize (?)

  16. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) log-likelihood of D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D linear in θ convex concave should be easy to maximize (?) NO!

  17. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) log-likelihood of D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D linear in θ convex concave should be easy to maximize (?) NO! estimating is a difficult inference problem Z ( θ )

  18. Likelihood in linear exponential family linear exponential family (log-linear models) Likelihood in (log-linear models) probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) log-likelihood of D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D linear in θ convex concave should be easy to maximize (?) NO! estimating is a difficult inference problem Z ( θ ) how about just using the gradient info? involves inference as well E ∇ log Z ( θ ) = [ ϕ ( x )] θ θ any combination of inference-gradient based optimization for learning undirected models

  19. Moment matching Moment matching for for linear exponential family linear exponential family probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) log-likelihood of D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D linear in θ convex concave set its derivative to zero ∇ ∣ D ∣( E E ℓ( θ , D ) = [ ϕ ( x )] − [ ϕ ( x )]) = 0 D θ p θ ⇒ E E [ ϕ ( x )] [ ϕ ( x )] = D p θ find the parameter θ that results in the same expected sufficient statistics as the data

  20. Moment matching Moment matching for for linear exponential family linear exponential family probability distribution 1 p ( x ; θ ) = exp(⟨ θ , ϕ ( x )⟩) Z ( θ ) log-likelihood of D ℓ( D , θ ) = ∣ D ∣ ⟨ θ , E ( [ ϕ ( x )]⟩ − log Z ( θ ) ) D linear in θ convex concave set its derivative to zero ∇ ∣ D ∣( E E ℓ( θ , D ) = [ ϕ ( x )] − [ ϕ ( x )]) = 0 D θ p θ ⇒ E E [ ϕ ( x )] [ ϕ ( x )] = D p θ p ( X = 0, X = 1; θ ) = p ( X = 0, X = 1) 1 2 1 2 D find the parameter θ that results in the same expected sufficient statistics as the data

  21. Learning needs inference Learning needs inference in an inner loop in an inner loop maximizing the likelihood: arg max log p ( D ∣ θ ) θ gradient ∝ E E [ ϕ ( x )] − [ ϕ ( x )] D p θ optimality condition E E [ ϕ ( x )] = [ ϕ ( x )] D p θ easy to calculate inference in the graphical model

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend