learning fast mixing models for structured prediction
play

Learning Fast-Mixing Models for Structured Prediction Jacob - PowerPoint PPT Presentation

Learning Fast-Mixing Models for Structured Prediction Jacob Steinhardt Percy Liang Stanford University { jsteinhardt,pliang } @cs.stanford.edu July 8, 2015 J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 1 / 11


  1. Learning Fast-Mixing Models for Structured Prediction Jacob Steinhardt Percy Liang Stanford University { jsteinhardt,pliang } @cs.stanford.edu July 8, 2015 J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 1 / 11

  2. Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11

  3. Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11

  4. Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: Use simple model u , exact inference J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11

  5. Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: Use simple model u , exact inference Use expressive model, Gibbs sampling (transition kernel A ) J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11

  6. Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: Use simple model u , exact inference Use expressive model, Gibbs sampling (transition kernel A ) Can we get the best of both worlds? J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11

  7. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  8. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  9. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  10. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  11. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  12. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  13. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  14. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u u A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  15. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u u A A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  16. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u u A A A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  17. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . ··· u u u A A A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  18. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . ··· u u u A A A A A A All Doeblin chains mix quickly: Proposition If ˜ A is ε strong Doeblin, then its mixing time is at most 1 ε . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  19. Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . ··· u u u A A A A A A All Doeblin chains mix quickly: Proposition If ˜ A is ε strong Doeblin, then its mixing time is at most 1 ε . Moreover, the stationary distribution is A T u , where T ∼ Geometric ( ε ) . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

  20. A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

  21. A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ A θ = ε u θ +( 1 − ε ) A θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

  22. A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ A θ = ε u θ +( 1 − ε ) A θ π θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

  23. A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

  24. A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

  25. A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: F 0 { u θ } θ ∈ Θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

  26. A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: { π θ } θ ∈ Θ F F 0 { u θ } θ ∈ Θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

  27. A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: ˜ F { π θ } θ ∈ Θ { ˜ π θ } θ ∈ Θ F F 0 { u θ } θ ∈ Θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

  28. A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: ˜ F { π θ } θ ∈ Θ { ˜ π θ } θ ∈ Θ F F 0 { u θ } θ ∈ Θ ˜ F parameterizes computationally tractable distributions! J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

  29. Strategy Parameterize strong Doeblin distributions ˜ π θ Maximize log-likelihood: L ( θ ) = 1 n ∑ n π θ ( z ( i ) ) i = 1 log ˜ Issue: hard to compute ∇ L ( θ ) J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 5 / 11

  30. Strategy Parameterize strong Doeblin distributions ˜ π θ Maximize log-likelihood: L ( θ ) = 1 n ∑ n π θ ( z ( i ) ) i = 1 log ˜ Issue: hard to compute ∇ L ( θ ) Insight: interpret Markov chain as latent variable model: u θ A θ A θ A θ p θ : z 1 z 2 ··· z T J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 5 / 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend