Maximum Reconstruction Estimation for Generative Latent-Variable - - PowerPoint PPT Presentation

▶

Feb 16, 2023 220 likes •495 views

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work with Yang Liu, Wei Xu 1 Problem Generative latent-variable models are important for natural language processing due to their capability of providing

SLIDE 1

Maximum Reconstruction Estimation for Generative Latent-Variable Models

Yong Cheng

joint work with Yang Liu, Wei Xu

SLIDE 2

Problem

Generative latent-variable models are important for natural language processing due to their capability

f providing compact representations of data.

Maximum likelihood estimation suffers from a significant problem: it may guide the model to focus on explaining irrelevant but common correlations in the data.

SLIDE 3

Maximum Reconstruction Estimation

Circumvent irrelevant but common correlations by maximizing the probability of reconstructing

bserved data.

SLIDE 4

Advantages: Direct learning of model parameters. Tractable inference.

Maximum Reconstruction Estimation

SLIDE 5

A generative latent-variable model: Maximum likelihood estimation (MLE) Inference

Maximum Likelihood Estimation

SLIDE 6

Maximum Reconstruction Estimation

Objective:

SLIDE 7

\

Maximum Reconstruction Estimation

Objective: Prediction:

SLIDE 8

Two classical generative latent-variable models: Hidden Markov models for unsupervised POS induction IBM translation models for unsupervised word alignment

Maximum Reconstruction Estimation

SLIDE 9

Given an observed English sentence, the task is to induce the latent sequence of part-of-speech tags.

Hidden Markov Models for Unsupervised POS Induction

SLIDE 10

Given an observed English sentence, the task is to induce the latent sequence of part-of-speech tags.

Hidden Markov Models for Unsupervised POS Induction

SLIDE 11

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation（MLE）

SLIDE 12

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation（MLE）

SLIDE 13

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation（MLE）

SLIDE 14

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation（MLE）

SLIDE 15

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation (MRE) Maximum Reconstruction Estimation（MLE）

SLIDE 16

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation (MRE) Maximum Reconstruction Estimation（MLE）

SLIDE 17

Experiments

Comparison with MLE

SLIDE 18

Experiments

Comparison with MLE

SLIDE 19

Experiments

Comparison with CRF autoencoder

SLIDE 20

Experiments

Example emission probabilities for the POS “VBD” (verb past tense)

SLIDE 21

IBM Translation Models for Unsupervised Word Alignment

SLIDE 22

IBM Translation Models for Unsupervised Word Alignment

Maximum Likelihood Estimation (MLE) Maximum Reconstruction Estimation (MRE)

SLIDE 23

IBM Translation Models for Unsupervised Word Alignment

Maximum Likelihood Estimation (MLE) Maximum Reconstruction Estimation (MRE)

SLIDE 24

IBM Translation Models for Unsupervised Word Alignment

Comparison with MLE

SLIDE 25

Conclusion

We have presented maximum reconstruction estimation for training generative latent-variable models such as hidden Markov models and IBM translation models. In the future, we plan to apply our approach to more generative latent-variable models such as probabilistic context-free grammars and explore the possibility of developing new training algorithms that minimize reconstruction errors.

SLIDE 26

Maximum Reconstruction Estimation for Generative Latent-Variable Models

Yong Cheng

Problem

Generative latent-variable models are important for natural language processing due to their capability

Maximum likelihood estimation suffers from a significant problem: it may guide the model to focus on explaining irrelevant but common correlations in the data.

Maximum Reconstruction Estimation

Circumvent irrelevant but common correlations by maximizing the probability of reconstructing

Advantages: Direct learning of model parameters. Tractable inference.

Maximum Reconstruction Estimation

A generative latent-variable model: Maximum likelihood estimation (MLE) Inference

Maximum Likelihood Estimation

Maximum Reconstruction Estimation

Objective:

\

Maximum Reconstruction Estimation

Objective: Prediction:

Two classical generative latent-variable models: Hidden Markov models for unsupervised POS induction IBM translation models for unsupervised word alignment

Maximum Reconstruction Estimation

Given an observed English sentence, the task is to induce the latent sequence of part-of-speech tags.

Hidden Markov Models for Unsupervised POS Induction

Given an observed English sentence, the task is to induce the latent sequence of part-of-speech tags.

Hidden Markov Models for Unsupervised POS Induction

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation（MLE）

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation（MLE）

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation（MLE）

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation（MLE）

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation (MRE) Maximum Reconstruction Estimation（MLE）

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation (MRE) Maximum Reconstruction Estimation（MLE）

Experiments

Comparison with MLE

Experiments

Comparison with MLE

Experiments

Comparison with CRF autoencoder

Experiments

Example emission probabilities for the POS “VBD” (verb past tense)

IBM Translation Models for Unsupervised Word Alignment

IBM Translation Models for Unsupervised Word Alignment

Maximum Likelihood Estimation (MLE) Maximum Reconstruction Estimation (MRE)

IBM Translation Models for Unsupervised Word Alignment

Maximum Likelihood Estimation (MLE) Maximum Reconstruction Estimation (MRE)

IBM Translation Models for Unsupervised Word Alignment

Comparison with MLE

Conclusion

Conclusion

Thank you !