A Latent Variable Model of Synchronous Parsing for Syntactic and - - PowerPoint PPT Presentation

a latent variable model of synchronous parsing for
SMART_READER_LITE
LIVE PREVIEW

A Latent Variable Model of Synchronous Parsing for Syntactic and - - PowerPoint PPT Presentation

A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation A Latent Variable Model of Synchronous Parsing for Syntactic and Semantic Dependencies James Henderson 1 Paola Merlo 2 Gabriele Musillo 1 2


slide-1
SLIDE 1

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

A Latent Variable Model of Synchronous Parsing for Syntactic and Semantic Dependencies

James Henderson 1 Paola Merlo 2 Gabriele Musillo 1 2 Ivan Titov 3

1Dept Computer Science, Univ Geneva 2Dept Linguistics, Univ Geneva 3Dept Computer Science, Univ Illinois at U-C

CoNLL 2008

slide-2
SLIDE 2

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Outline

1

A Latent Variable Model of Synchronous Parsing

2

Probability Model

3

Machine Learning Method

4

Evaluation

slide-3
SLIDE 3

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Motivation for synchronous parsing

Syntax and semantics are separate structures, with different generalisations

Sub

Obj John broke the vase. A0 A1 Sub The vase broke. A1 Syntax and semantics are highly correlated, and therefore should be learned jointly Synchronous parsing provides a single joint model of two separate structures

slide-4
SLIDE 4

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Motivation for latent variables

The correlations between syntax and semantics are partly lexical, and independence assumptions are hard to specify a priori The dataset is new, and there was little time for feature engineering Latent variables provide a powerful mechanism for discovering correlations both within and between the structures

slide-5
SLIDE 5

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Outline

1

A Latent Variable Model of Synchronous Parsing

2

Probability Model

3

Machine Learning Method

4

Evaluation

slide-6
SLIDE 6

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Outline

1

A Latent Variable Model of Synchronous Parsing

2

Probability Model

3

Machine Learning Method

4

Evaluation

slide-7
SLIDE 7

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

The Probability Model

A generative, history-based model

  • f the joint probability
  • f syntactic and semantic synchronous derivations

synchronised at each word.

slide-8
SLIDE 8

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Syntactic and semantic dependencies example

ROOT Hope seems doomed to failure

P(Td, Ts)

slide-9
SLIDE 9

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Syntactic and semantic derivations

Define two separate derivations, one for the syntactic structure and one for the semantic structure. P(Td, Ts) = P(D1

d, ..., Dmd d , D1 s, ..., Dms s )

Actions of an incremental shift-reduce style parser similar to MALT [Nivre et al., 2006] Semantic derivations are less constrained, because their structures are less constrained Assumes each dependency structure is individually planar (“projective”)

slide-10
SLIDE 10

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Synchronisation granularity

Use an intermediate synchronisation granularity, between full predications and individual actions. Ct = D

bt

d

d , ..., D et

d

d , shiftt, Dbt

s

s , ..., Det

s

s , shiftt

P(D1

d, ..., Dmd d , D1 s, ..., Dms s ) = P(C1, . . . , Cn)

Synchronisation at each word prediction Results in one shared input queue Allows two separate stacks

slide-11
SLIDE 11

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Synchronous parsing example

ROOT Hope

P(C1)

slide-12
SLIDE 12

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Synchronous parsing example

ROOT Hope seems

P(C1) P(C2|C1)

slide-13
SLIDE 13

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Synchronous parsing example

ROOT Hope seems doomed

P(C1) P(C2|C1) P(C3|C1, C2)

slide-14
SLIDE 14

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Synchronous parsing example

ROOT Hope seems doomed to

P(C1) P(C2|C1) P(C3|C1, C2) P(C4|C1, C2, C3)

slide-15
SLIDE 15

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Synchronous parsing example

ROOT Hope seems doomed to failure

P(C1) P(C2|C1) P(C3|C1, C2) P(C4|C1, C2, C3) P(C5|C1, C2, C3, C4)

slide-16
SLIDE 16

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope

slide-17
SLIDE 17

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems

slide-18
SLIDE 18

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems

slide-19
SLIDE 19

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems

slide-20
SLIDE 20

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems

slide-21
SLIDE 21

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems doomed

slide-22
SLIDE 22

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems doomed

slide-23
SLIDE 23

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems doomed

slide-24
SLIDE 24

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems doomed

slide-25
SLIDE 25

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems doomed to

slide-26
SLIDE 26

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems doomed to

slide-27
SLIDE 27

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems doomed to

slide-28
SLIDE 28

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems doomed to failure

slide-29
SLIDE 29

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems doomed to failure

slide-30
SLIDE 30

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Derivation example

ROOT Hope seems doomed to failure

slide-31
SLIDE 31

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Projectivisation

Allows crossing links between syntax and semantics Use the HEAD method [Nivre et al., 2006] to projectivise syntax Use syntactic dependencies to projectivise semantic dependencies

slide-32
SLIDE 32

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Projectivising semantic dependencies

w1 w2 w3 w4 w5 A B C w1 w2 w3 w4 w5 C B A/C

slide-33
SLIDE 33

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Outline

1

A Latent Variable Model of Synchronous Parsing

2

Probability Model

3

Machine Learning Method

4

Evaluation

slide-34
SLIDE 34

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

The Machine Learning Method

Synchronous derivations are modeled with an Incremental Sigmoid Belief Network (ISBN). ISBNs are Dynamic Bayesian Networks for modeling structures, with vectors of latent variables annotating derivation states that represent features of the derivation history. Use the neural network approximation of ISBNs [Titov and Henderson, ACL 2007] (“Simple Synchrony Netowrks”)

slide-35
SLIDE 35

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Statistical dependencies in the ISBN

Connections between latent states reflect locality in the syntactic or semantic structure, thereby specifying the domain of locality for conditioning decisions Explicit conditioning features of the history are also specified

D S S D D S t−1 t−c t−c t−1 t t

slide-36
SLIDE 36

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Connections between latent states

Distinguish between syntactic states and semantic states

  • f the derivation

Connections both within and between types of states Recent Current Syn-Syn Srl-Srl Syn-Srl Srl-Syn Next Next + + + (+) Top Top + + + (+) RgtDepTop Top + + LftDepTop Top + + HeadTop Top + + LftDepNext Top + + Next Top +

slide-37
SLIDE 37

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Explicit conditioning features

State Syntax

LEX POS DEP

Next + + SynTop + + SynTop - 1 + Head SynTop + RgtD SynTop + LftD SynTop + LftD Next + State Semantics

LEX POS DEP SENSE

Next + + + SemTop + + + SemTop - 1 + + Head SemTop + + RgtD SemTop + LftD SemTop + LftD Next + A0-A5 SemTop + A0-A5 Next +

slide-38
SLIDE 38

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Outline

1

A Latent Variable Model of Synchronous Parsing

2

Probability Model

3

Machine Learning Method

4

Evaluation

slide-39
SLIDE 39

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

The Evaluation

Two models reported Submitted model:

vocabulary of 1083 words latent vector of 60 features no semantics-to-syntax latent state connections a form of Minimum Bayes Risk (MBR) decoding for syntax

Larger model:

vocabulary of 4392 words latent vector of 80 features includes semantics-to-syntax latent state connections decoding optimises joint probability

slide-40
SLIDE 40

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Results

Syntactic Semantic Overall LAS P R F1 F1 Submitted WSJ 87.8 79.6 66.2 72.3 80.2 Brn 80.0 66.6 55.3 60.4 70.3 WSJ+Brn 86.9 78.2 65.0 71.0 79.1 Large WSJ 88.5 80.4 69.2 74.4 81.5 Brn 81.0 68.3 57.7 62.6 71.9 WSJ+Brn 87.6 79.1 67.9 73.1 80.5 Larger model does better (1.5%) than smaller submitted model Large model would be fifth overall

slide-41
SLIDE 41

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

MBR versus joint inference

Syntactic LAS Submitted Dev 86.1 Joint optimisation Dev 85.5 Large (joint optimisation) Dev 86.5 MBR for syntax helps a bit (0.6%) but not as much as the large model (1.0%)

slide-42
SLIDE 42

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Additional experiments

Removing latent connections between syntax and semantics reduced semantic performance by 3.5%, indicating the importance of the latent variables for finding the correlations between these structures When evaluated only on syntactic dependencies, the submitted model performs slighly (0.2%) better than a model trained only on syntactic depedencies, indicating that training a joint model does not harm performance of the syntax component, and may help

slide-43
SLIDE 43

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Conclusions

Synchronous derivations are an effective way to build joint models of separate structures The latent features of ISBNs help find correlations between structures ISBNs extend well to more complex automata than push-down automata

slide-44
SLIDE 44

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Current Work

Derivations which projectivise on-line (81.8% overall F-measure, 1.3% improvement) Better feature engineering, particularly for semantic parse decisions

slide-45
SLIDE 45

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Acknowledgements

This work was partly funded by European Community FP7 project CLASSiC (www.classic-project.org), a Swiss NSF grant, two Swiss NSF fellowships. Part of this work was done when G. Musillo was visiting MIT/CSAIL.

slide-46
SLIDE 46

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Projectivising semantic dependencies

An arc is un-crossed by replacing its argument a with a’s syntactic head and noting this change in the arc label. This change is repeated as necessary using a heuristic greedy search.

slide-47
SLIDE 47

university-logo A Latent Variable Model of Synchronous Parsing Probability Model Machine Learning Method Evaluation

Decoding

Beam search used to search for the most probable derivation For submitted model, chose syntactic structure by summing over beam of semantic structures