Learning Fast-Mixing Models for Structured Prediction Jacob - PowerPoint PPT Presentation

Learning Fast-Mixing Models for Structured Prediction Jacob Steinhardt Percy Liang Stanford University { jsteinhardt,pliang } @cs.stanford.edu July 8, 2015 J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 1 / 11

Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11

Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11

Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: Use simple model u , exact inference J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11

Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: Use simple model u , exact inference Use expressive model, Gibbs sampling (transition kernel A ) J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11

Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: Use simple model u , exact inference Use expressive model, Gibbs sampling (transition kernel A ) Can we get the best of both worlds? J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u u A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u u A A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u u A A A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . ··· u u u A A A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . ··· u u u A A A A A A All Doeblin chains mix quickly: Proposition If ˜ A is ε strong Doeblin, then its mixing time is at most 1 ε . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . ··· u u u A A A A A A All Doeblin chains mix quickly: Proposition If ˜ A is ε strong Doeblin, then its mixing time is at most 1 ε . Moreover, the stationary distribution is A T u , where T ∼ Geometric ( ε ) . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11

A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ A θ = ε u θ +( 1 − ε ) A θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ A θ = ε u θ +( 1 − ε ) A θ π θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: F 0 { u θ } θ ∈ Θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: { π θ } θ ∈ Θ F F 0 { u θ } θ ∈ Θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: ˜ F { π θ } θ ∈ Θ { ˜ π θ } θ ∈ Θ F F 0 { u θ } θ ∈ Θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: ˜ F { π θ } θ ∈ Θ { ˜ π θ } θ ∈ Θ F F 0 { u θ } θ ∈ Θ ˜ F parameterizes computationally tractable distributions! J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11

Strategy Parameterize strong Doeblin distributions ˜ π θ Maximize log-likelihood: L ( θ ) = 1 n ∑ n π θ ( z ( i ) ) i = 1 log ˜ Issue: hard to compute ∇ L ( θ ) J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 5 / 11

Strategy Parameterize strong Doeblin distributions ˜ π θ Maximize log-likelihood: L ( θ ) = 1 n ∑ n π θ ( z ( i ) ) i = 1 log ˜ Issue: hard to compute ∇ L ( θ ) Insight: interpret Markov chain as latent variable model: u θ A θ A θ A θ p θ : z 1 z 2 ··· z T J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 5 / 11

Learning Fast-Mixing Models for Structured Prediction Jacob - PowerPoint PPT Presentation

Learning Fast-Mixing Models for Structured Prediction Jacob Steinhardt Percy Liang Stanford University { jsteinhardt,pliang } @cs.stanford.edu July 8, 2015 J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 1 / 11

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Energy-Efficient Mixing Solutions The power of innovation BioMix TM Compressed Gas Mixing

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

L101: Introduction to Structured Prediction Ryan Cotterell What is structured prediction?

Training Strategies CS 6355: Structured Prediction 1 So far we saw What is structured output

CSCE 496/896 Lecture 11: Structured Prediction and Structured Prediction and Probabilistic

Math 211 Math 211 Lecture #7 Mixing Problems September 10, 2003 2 Mixing Problem #1 Mixing

Structured Prediction Final words CS 6355: Structured Prediction 1 A look back What is a

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Course Information CS 6355: Structured Prediction Building up structured output prediction

L101: Incremental structured prediction Structured prediction reminder Given an input x (e.g. a

Complex Prediction Problems A novel approach to multiple Structured Output Prediction Yasemin

CSCE 970 Lecture 8: Prediction Stephen Scott Structured Prediction and Vinod Variyam

Advanced Algorithms (XIV) Shanghai Jiao Tong University Chihao Zhang June 8, 2020 Mixing Time

Software Engineering I (02161) Introduction to Software Engineering Assoc. Prof. Hubert

What are the main data str u ct u res in P y thon ? P R AC TIC IN G C OD IN G IN TE R VIE W QU

Complex Libraries Using Hash Dictionaries 1 Playing Hash Table You are the new produce manager

Applying an untargeted metabolomics approach using two complementary platforms for the discovery

Using SAT and SMT Solvers for Finite Model Finding with Sorts Giles Reger Martin Suda School of

Strings II Review Strings are stored character by character.

CS 133 - Introduction to Computational and Data Science Instructor: Renzhi Cao Computer Science

Feynman Integrals, Elliptic polylogarithms and mixed Hodge structures Pierre Vanhove New