Semantics as a Foreign Language Gabriel Stanovsky and Ido Dagan - - PowerPoint PPT Presentation

semantics as a foreign language
SMART_READER_LITE
LIVE PREVIEW

Semantics as a Foreign Language Gabriel Stanovsky and Ido Dagan - - PowerPoint PPT Presentation

Semantics as a Foreign Language Gabriel Stanovsky and Ido Dagan EMNLP 2018 Semantic Dependency Parsing (SDP) A collection of three semantic formalisms (Oepen et al., 2014;2015) Semantic Dependency Parsing (SDP) A collection of three


slide-1
SLIDE 1

Semantics as a Foreign Language

Gabriel Stanovsky and Ido Dagan EMNLP 2018

slide-2
SLIDE 2

Semantic Dependency Parsing (SDP)

  • A collection of three semantic formalisms (Oepen et al., 2014;2015)
slide-3
SLIDE 3

Semantic Dependency Parsing (SDP)

  • A collection of three semantic formalisms (Oepen et al., 2014;2015)

a. DM (derived from MRS) (Copestake et al., 1999, Flickinger, 2000)

slide-4
SLIDE 4

Semantic Dependency Parsing (SDP)

  • A collection of three semantic formalisms (Oepen et al., 2014;2015)

a. DM (derived from MRS) b. Prague Semantic Dependencies (PSD) (Hajic et al., 2012)

slide-5
SLIDE 5

Semantic Dependency Parsing (SDP)

  • A collection of three semantic formalisms (Oepen et al., 2014;2015)

a. DM (derived from MRS) b. Prague Semantic Dependencies (PSD) c. Predicate Argument Structures (PAS) (Miyao et al., 2014)

slide-6
SLIDE 6

Semantic Dependency Parsing (SDP)

  • A collection of three semantic formalisms (Oepen et al., 2014;2015)

a. DM (derived from MRS) b. Prague Semantic Dependencies (PSD) c. Predicate Argument Structures (PAS)

  • Aim to capture semantic predicate-argument relations
slide-7
SLIDE 7

Semantic Dependency Parsing (SDP)

  • A collection of three semantic formalisms (Oepen et al., 2014;2015)

a. DM (derived from MRS) b. Prague Semantic Dependencies (PSD) c. Predicate Argument Structures (PAS)

  • Aim to capture semantic predicate-argument relations
  • Represented in a graph structure
slide-8
SLIDE 8

Semantic Dependency Parsing (SDP)

  • A collection of three semantic formalisms (Oepen et al., 2014;2015)

a. DM (derived from MRS) b. Prague Semantic Dependencies (PSD) c. Predicate Argument Structures (PAS)

  • Aim to capture semantic predicate-argument relations
  • Represented in a graph structure

a. Nodes: single words from the sentence

slide-9
SLIDE 9

Semantic Dependency Parsing (SDP)

  • A collection of three semantic formalisms (Oepen et al., 2014;2015)

a. DM (derived from MRS) b. Prague Semantic Dependencies (PSD) c. Predicate Argument Structures (PAS)

  • Aim to capture semantic predicate-argument relations
  • Represented in a graph structure

a. Nodes: single words from the sentence b. Labeled edges: semantic relations, according to the paradigm

slide-10
SLIDE 10

Outline

slide-11
SLIDE 11

Outline

  • SDP as Machine Translation

○ Different formalisms as foreign languages ○ Motivation: downstream tasks, inter-task analysis, extendable framework

slide-12
SLIDE 12

Outline

  • SDP as Machine Translation

○ Different formalisms as foreign languages ○ Motivation: downstream tasks, inter-task analysis, extendable framework ○ Previous work explored the relation between MT and semantics (Wong and Mooney, 2007), (Vinyals et al., 2015), (Flanigan et al., 2016)

slide-13
SLIDE 13

Outline

  • SDP as Machine Translation

○ Different formalisms as foreign languages ○ Motivation: downstream tasks, inter-task analysis, extendable framework ○ Previous work explored the relation between MT and semantics (Wong and Mooney, 2007), (Vinyals et al., 2015), (Flanigan et al., 2016)

  • Model

○ Seq2Seq ○ Directed graph linearization

slide-14
SLIDE 14

Outline

  • SDP as Machine Translation

○ Different formalisms as foreign languages ○ Motivation: downstream tasks, inter-task analysis, extendable framework ○ Previous work explored the relation between MT and semantics (Wong and Mooney, 2007), (Vinyals et al., 2015), (Flanigan et al., 2016)

  • Model

○ Seq2Seq ○ Directed graph linearization

  • Results

○ Raw text to SDP (near state-of-the-art) ○ Novel inter-task analysis

slide-15
SLIDE 15

Outline

  • SDP as Machine Translation

○ Different formalisms as foreign languages ○ Motivation: downstream tasks, inter-task analysis, extendable framework ○ Previous work explored the relation between MT and semantics (Wong and Mooney, 2007), (Vinyals et al., 2015), (Flanigan et al., 2016)

  • Model

○ Seq2Seq ○ Linearization

  • Results

○ Raw text -> SDP (near state-of-the-art) ○ Novel inter-task analysis

slide-16
SLIDE 16

Semantic Dependencies as MT

Source Target

Raw sentence

slide-17
SLIDE 17

Semantic Dependencies as MT

Source Target

Raw sentence Syntax

Grammar as a foreign language

slide-18
SLIDE 18

Semantic Dependencies as MT

Source Target

Raw sentence SDP Syntax

Grammar as a foreign language

T h i s w

  • r

k

slide-19
SLIDE 19

Semantic Dependencies as MT

  • Standard MTL: 3 tasks

Raw sentence PSD DM PAS

slide-20
SLIDE 20

Semantic Dependencies as MT

  • Standard MTL: 3 tasks

Raw sentence PSD DM PAS

  • Inter-task translation (9 tasks)
slide-21
SLIDE 21

Outline

  • SDP as Machine Translation

○ Motivation: downstream tasks ○ Different formalisms as foreign languages

  • Model

○ Seq2Seq ○ Linearization

  • Results

○ Raw text -> SDP (near state-of-the-art) ○ Novel inter-task analysis

slide-22
SLIDE 22

Our Model Ⅰ: Raw -> SDPx

  • Seq2Seq translation model:

○ Bi-LSTM encoder-decoder with attention

Linear DM

<from: RAW>

the cat sat the mat

  • n

<to: DM>

slide-23
SLIDE 23

Our Model Ⅰ: Raw -> SDPx

  • Seq2Seq translation model:

○ Bi-LSTM encoder-decoder with attention

Linear DM

<from: RAW>

the cat sat the mat

  • n

<to: DM>

Special from and to symbols

slide-24
SLIDE 24

Our Model Ⅱ: SDPy -> SDPx

  • Seq2Seq translation model:

○ Bi-LSTM encoder-decoder with attention

Linear DM

Special from and to symbols

Linear PSD

<from: PSD> <to: DM>

slide-25
SLIDE 25

Our Model

Linear SDPx Linear SDPy

<from: SDPy> <to: SDPx>

Seq2seq prediction requires a 1:1 linearization function

slide-26
SLIDE 26

Linearization: Background

  • Previous work used bracketed tree linearization

(ROOT (NP (NNP Bob )NNP )NP (VP messaged (NP Alice )NP )VP )ROOT

(Vinyals et al., 2015; Konstas et al., 2017; Buys and Blunsom, 2017)

slide-27
SLIDE 27

Linearization: Background

  • Previous work used bracketed tree linearization

(ROOT (NP (NNP Bob )NNP )NP (VP messaged (NP Alice )NP )VP )ROOT

(Vinyals et al., 2015; Konstas et al., 2017; Buys and Blunsom, 2017)

slide-28
SLIDE 28

Linearization: Background

  • Previous work used bracketed tree linearization

(ROOT (NP (NNP Bob )NNP )NP (VP messaged (NP Alice )NP )VP )ROOT

(Vinyals et al., 2015; Konstas et al., 2017; Buys and Blunsom, 2017)

slide-29
SLIDE 29

Linearization: Background

  • Previous work used bracketed tree linearization

(ROOT (NP (NNP Bob )NNP )NP (VP messaged (NP Alice )NP )VP )ROOT

(Vinyals et al., 2015; Konstas et al., 2017; Buys and Blunsom, 2017)

slide-30
SLIDE 30

Linearization: Background

  • Previous work used bracketed tree linearization

(ROOT (NP (NNP John )NNP )NP (VP messaged (NP Alice )NP )VP )ROOT

(Vinyals et al., 2015; Konstas et al., 2017; Buys and Blunsom, 2017)

slide-31
SLIDE 31

Linearization: Background

  • Previous work used bracketed tree linearization

(ROOT (NP (NNP John )NNP )NP (VP messaged (NP Alice )NP )VP )ROOT

(Vinyals et al., 2015; Konstas et al., 2017; Buys and Blunsom, 2017)

slide-32
SLIDE 32

Linearization: Background

  • Previous work used bracketed tree linearization

(ROOT (NP (NNP John )NNP )NP (VP messaged (NP Alice )NP )VP )ROOT

(Vinyals et al., 2015; Konstas et al., 2017; Buys and Blunsom, 2017)

  • Depth-first representation doesn’t directly apply to SDP graphs

○ Non-connected components ○ Re-entrencies

slide-33
SLIDE 33

SDP Linearization (Connectivity)

  • Problem: No single root from which to start linearization
slide-34
SLIDE 34

SDP Linearization (Connectivity)

  • Problem: No single root from which to start linearization
slide-35
SLIDE 35
  • Solution: Artificial SHIFT edges between non-connected adjacent words

○ All nodes are now reachable from the first word

SDP Linearization (Connectivity)

  • Problem: No single root from which to start linearization
slide-36
SLIDE 36

SDP Linearization (Re-entrancies)

  • Re-entrancies require a 1:1 node representation
slide-37
SLIDE 37

SDP Linearization (Re-entrencies)

  • Re-entrancies require a 1:1 node representation
slide-38
SLIDE 38

SDP Linearization (Re-entrencies)

  • Re-entrancies require a 1:1 node representation

(relative index / surface form)

slide-39
SLIDE 39

SDP Linearization (Re-entrencies)

0/couch-potato

  • Re-entrancies require a 1:1 node representation

(relative index / surface form)

slide-40
SLIDE 40

SDP Linearization (Re-entrencies)

0/couch-potato compound +1/jocks

  • Re-entrancies require a 1:1 node representation

(relative index / surface form)

slide-41
SLIDE 41

SDP Linearization (Re-entrencies)

0/couch-potato compound +1/jocks shift +1/watching

  • Re-entrancies require a 1:1 node representation

(relative index / surface form)

slide-42
SLIDE 42

SDP Linearization (Re-entrencies)

0/couch-potato compound +1/jocks shift +1/watching ARG1 -1/jocks

  • Re-entrancies require a 1:1 node representation

(relative index / surface form)

slide-43
SLIDE 43

Outline

  • SDP as Machine Translation

○ Motivation: downstream tasks ○ Different formalisms as foreign languages

  • Model

○ Linearization ○ Dual Encoder-Single decoder Seq2Seq

  • Results

○ Raw text -> SDP (near state-of-the-art) ○ Novel inter-task analysis

slide-44
SLIDE 44

Experimental Setup

  • Train samples per task: 35,657 sentences (Oepen et al., 2015)

○ 9 translation tasks

slide-45
SLIDE 45

Experimental Setup

  • Train samples per task: 35,657 sentences (Oepen et al., 2015)

○ 9 translation tasks

  • Total training samples: 320,913 source-target pairs
slide-46
SLIDE 46

Experimental Setup

  • Train samples per task: 35,657 sentences (Oepen et al., 2015)

○ 9 translation tasks

  • Total training samples: 320,913 source-target pairs
  • Trained in batches between the 9 different tasks
slide-47
SLIDE 47

Evaluations:RAW → SDP(x)

Labeled F1 score

slide-48
SLIDE 48

Evaluations:RAW → SDP(x)

Labeled F1 score

slide-49
SLIDE 49

Evaluations:RAW → SDP(x)

Labeled F1 score

slide-50
SLIDE 50

Evaluations:RAW → SDP(x)

Labeled F1 score

slide-51
SLIDE 51

Evaluations:SDP(a) → SDP(b)

Labeled F1 score

slide-52
SLIDE 52

Evaluations:SDP(a) → SDP(b)

  • Translating between representations is easier than parsing from raw text

Labeled F1 score

slide-53
SLIDE 53

Evaluations:SDP(a) → SDP(b)

  • Translating between representations is easier than parsing from raw text
  • Easy to convert between PAS and DM

Labeled F1 score

slide-54
SLIDE 54

Evaluations:SDP(a) → SDP(b)

  • Translating between representations is easier than parsing from raw text
  • Easy to convert between PAS and DM
  • PSD is a good input, but relatively hard output

Labeled F1 score

slide-55
SLIDE 55

Conclusions

slide-56
SLIDE 56

Conclusions

  • Effective graph linearization for SDP

○ Near state-of-the-art results

slide-57
SLIDE 57

Conclusions

  • Effective graph linearization for SDP

○ Near state-of-the-art results

  • Inter-task analysis

○ Enabled by the generic seq2seq framework

slide-58
SLIDE 58

Conclusions

  • Effective graph linearization for SDP

○ Near state-of-the-art results

  • Inter-task analysis

○ Enabled by the generic seq2seq framework

  • Future work

○ Apply linearizations in downstream tasks (NMT) ○ Add more representations (AMR, UD)

slide-59
SLIDE 59

Conclusions

  • Effective graph linearization for SDP

○ Near state-of-the-art results

  • Inter-task analysis

○ Enabled by the generic seq2seq framework

  • Future work

○ Apply linearizations in downstream tasks (NMT) ○ Add more representations (AMR, UD)

Thanks for listening!

slide-60
SLIDE 60
slide-61
SLIDE 61
slide-62
SLIDE 62
slide-63
SLIDE 63

BACKUP SLIDES

slide-64
SLIDE 64

Evaluations: Node ordering

  • Smaller-first ordering consistently does better across all representations
slide-65
SLIDE 65

Semantic Formalisms

  • Many formalisms try to represent the meaning of a sentence

○ MRS, AMR, PSD, SDP, etc…

slide-66
SLIDE 66

Semantic Dependencies as MT

  • Syntactic parsing as MT (“Grammar as a foreign language”, [Vinyals et. al, 2014])

Jane had a cat

  • We aim to do the same for SDP

○ The different formalisms as foreign languages

slide-67
SLIDE 67

Semantic Dependencies as MT

Raw text PSD DM PAS

slide-68
SLIDE 68

Our Model

  • Seq2Seq translation model:

○ Bi-LSTM encoder-decoder with attention

  • Two shared encoders

○ From raw to SDP graphs ○ Between SDP graphs

  • One global decoder for all samples
  • Add “<from:X> <to:Y>” tags to input as preprocessing

○ Where X, Y in {RAW, PSD, PAS, DM} ○ Different than Google’s NMT, which didn’t have <from:X> tags

■ No “code-switching” is allowed

slide-69
SLIDE 69

Motivation

  • Linearization is an easy way to plug-in predicted structures in NNs

○ MT Target side syntax (Aharoni and Goldberg, 2017; Wang et al., 2018)

  • Allows Inter-task analysis
slide-70
SLIDE 70

Motivation

  • Linearization is an easy way to plug-in predicted structures in NNs

○ MT Target side syntax (Aharoni and Goldberg, 2017; Wang et al., 2018)

slide-71
SLIDE 71

Motivation

  • Linearization is an easy way to plug-in predicted structures in NNs

○ MT Target side syntax (Aharoni and Goldberg, 2017; Wang et al., 2018)

  • Allows Inter-task analysis
  • Easily extendable framework
slide-72
SLIDE 72

SDP Linearization (node ordering)

slide-73
SLIDE 73

SDP Linearization (node ordering)

slide-74
SLIDE 74

SDP Linearization (node ordering)

slide-75
SLIDE 75

SDP Linearization (node ordering)

  • Neighbor orderings:

a. Random - (play, for, jocks, now) b. Closest-first c. Sentence-order d. Smaller-first

slide-76
SLIDE 76
  • Neighbor orderings:

a. Random b. Closest-first - (now, for, play, jocks) c. Sentence-order d. Smaller-first

SDP Linearization

slide-77
SLIDE 77
  • Neighbor orderings:

a. Random b. Closest-first c. Sentence-order - (jocks, now, for, play) d. Smaller-first

SDP Linearization

slide-78
SLIDE 78

SDP Linearization

  • Neighbor orderings:

a. Random b. Closest-first c. Sentence-order d. Smaller-first - (now, play, for, jocks)

slide-79
SLIDE 79

Evaluations: Node ordering

  • Smaller-first ordering consistently does better across all representations

Labeled F1 score