Maximum Reconstruction Estimation for Generative Latent-Variable - - PowerPoint PPT Presentation

maximum reconstruction estimation for generative latent
SMART_READER_LITE
LIVE PREVIEW

Maximum Reconstruction Estimation for Generative Latent-Variable - - PowerPoint PPT Presentation

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work with Yang Liu, Wei Xu 1 Problem Generative latent-variable models are important for natural language processing due to their capability of providing


slide-1
SLIDE 1

Maximum Reconstruction Estimation for Generative Latent-Variable Models

Yong Cheng

joint work with Yang Liu, Wei Xu

1

slide-2
SLIDE 2

Problem

Generative latent-variable models are important for natural language processing due to their capability

  • f providing compact representations of data.

Maximum likelihood estimation suffers from a significant problem: it may guide the model to focus on explaining irrelevant but common correlations in the data.

2

slide-3
SLIDE 3

Maximum Reconstruction Estimation

Circumvent irrelevant but common correlations by maximizing the probability of reconstructing

  • bserved data.

3

slide-4
SLIDE 4

Advantages: Direct learning of model parameters. Tractable inference.

4

Maximum Reconstruction Estimation

slide-5
SLIDE 5

A generative latent-variable model: Maximum likelihood estimation (MLE) Inference

5

Maximum Likelihood Estimation

slide-6
SLIDE 6

6

Maximum Reconstruction Estimation

Objective:

slide-7
SLIDE 7

\

7

Maximum Reconstruction Estimation

Objective: Prediction:

slide-8
SLIDE 8

Two classical generative latent-variable models: Hidden Markov models for unsupervised POS induction IBM translation models for unsupervised word alignment

8

Maximum Reconstruction Estimation

slide-9
SLIDE 9

Given an observed English sentence, the task is to induce the latent sequence of part-of-speech tags.

9

Hidden Markov Models for Unsupervised POS Induction

slide-10
SLIDE 10

Given an observed English sentence, the task is to induce the latent sequence of part-of-speech tags.

10

Hidden Markov Models for Unsupervised POS Induction

slide-11
SLIDE 11

11

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation(MLE)

slide-12
SLIDE 12

12

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation(MLE)

slide-13
SLIDE 13

13

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation(MLE)

slide-14
SLIDE 14

14

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation(MLE)

slide-15
SLIDE 15

15

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation (MRE) Maximum Reconstruction Estimation(MLE)

slide-16
SLIDE 16

16

Hidden Markov Models for Unsupervised POS Induction

Maximum Reconstruction Estimation (MRE) Maximum Reconstruction Estimation(MLE)

slide-17
SLIDE 17

Experiments

17

Comparison with MLE

slide-18
SLIDE 18

Experiments

18

Comparison with MLE

slide-19
SLIDE 19

Experiments

19

Comparison with CRF autoencoder

slide-20
SLIDE 20

Experiments

20

Example emission probabilities for the POS “VBD” (verb past tense)

slide-21
SLIDE 21

IBM Translation Models for Unsupervised Word Alignment

21

slide-22
SLIDE 22

IBM Translation Models for Unsupervised Word Alignment

22

Maximum Likelihood Estimation (MLE) Maximum Reconstruction Estimation (MRE)

slide-23
SLIDE 23

IBM Translation Models for Unsupervised Word Alignment

23

Maximum Likelihood Estimation (MLE) Maximum Reconstruction Estimation (MRE)

slide-24
SLIDE 24

IBM Translation Models for Unsupervised Word Alignment

24

Comparison with MLE

slide-25
SLIDE 25

Conclusion

25

Conclusion

We have presented maximum reconstruction estimation for training generative latent-variable models such as hidden Markov models and IBM translation models. In the future, we plan to apply our approach to more generative latent-variable models such as probabilistic context-free grammars and explore the possibility of developing new training algorithms that minimize reconstruction errors.

slide-26
SLIDE 26

26

Thank you !