Modeling Discourse Cohesion for Discourse Parsing via Memory Network - - PowerPoint PPT Presentation

modeling discourse cohesion for discourse parsing via
SMART_READER_LITE
LIVE PREVIEW

Modeling Discourse Cohesion for Discourse Parsing via Memory Network - - PowerPoint PPT Presentation

Modeling Discourse Cohesion for Discourse Parsing via Memory Network Yanyan Jia, Yuan Ye, Yansong Feng, Yuxuan Lai, Rui Yan and Dongyan Zhao Institute of Computer Science and Technology, Peking University Discourse Dependency Parsing EDU means


slide-1
SLIDE 1

Modeling Discourse Cohesion for Discourse Parsing via Memory Network

Yanyan Jia, Yuan Ye, Yansong Feng, Yuxuan Lai, Rui Yan and Dongyan Zhao

Institute of Computer Science and Technology, Peking University

slide-2
SLIDE 2

Discourse Dependency Parsing

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Root

EDU33 EDU2 EDU32 EDU1 EDU3

···

Attribution Background Elaboration Attribution Root

EDU means Element Discourse Unit

slide-3
SLIDE 3

Discourse Dependency Parsing

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Root

EDU33 EDU2 EDU32 EDU1 EDU3

···

Attribution Background Elaboration Attribution Root

slide-4
SLIDE 4

Motivation

  • Identifying long-span dependencies between element

discourse units

– Discourse structure

  • Morris and Hirst, 1991 extracts features to characterize discourse

structures

– Discourse cohesion

  • Joty et al., 2013 uses lexical chain features to model discourse

cohesion

slide-5
SLIDE 5

Motivation

  • Identifying long-span dependencies between element

discourse units

– Discourse structure

  • Morris and Hirst, 1991 extracts features to characterize discourse

structures

– Discourse cohesion

  • Joty et al., 2013 uses lexical chain feature to model discourse

cohesion

Our Work: Use Memory network to implicitly capture discourse cohesion

slide-6
SLIDE 6

EDU1: I feel hungry after wake up, EDU2: I rush into the kitchen and make my breakfast. EDU3: My breakfast is hamburger. EDU11: But the hamburger is cold, EDU12: order some take-away food is better, maybe. EDU6: I drive into the highway, EDU7: but meet a traffic jam. EDU8: Oh, I finally arrive at the company. EDU9: It is nine o’clock. EDU10: Thank God, I am not late for work. EDU4: It is eight o’clock when I leave home. EDU5: So late!

How Does Memory Network Work?

slide-7
SLIDE 7

EDU1: I feel hungry after wake up, EDU2: I rush into the kitchen and make my breakfast. EDU3: My breakfast is hamburger. EDU11: But the hamburger is cold, EDU12: order some take-away food is better, maybe. EDU6: I drive into the highway, EDU7: but meet a traffic jam. EDU8: Oh, I finally arrive at the company. EDU9: It is nine o’clock. EDU10: Thank God, I am not late for work. EDU4: It is eight o’clock when I leave home. EDU5: So late!

Food

How Does Memory Network Work?

slide-8
SLIDE 8

EDU1: I feel hungry after wake up, EDU2: I rush into the kitchen and make my breakfast. EDU3: My breakfast is hamburger. EDU11: But the hamburger is cold, EDU12: order some take-away food is better, maybe. EDU6: I drive into the highway, EDU7: but meet a traffic jam. EDU8: Oh, I finally arrive at the company. EDU9: It is nine o’clock. EDU10: Thank God, I am not late for work. EDU4: It is eight o’clock when I leave home. EDU5: So late!

Time

How Does Memory Network Work?

slide-9
SLIDE 9

EDU1: I feel hungry after wake up, EDU2: I rush into the kitchen and make my breakfast. EDU3: My breakfast is hamburger. EDU11: But the hamburger is cold, EDU12: order some take-away food is better, maybe. EDU6: I drive into the highway, EDU7: but meet a traffic jam. EDU8: Oh, I finally arrive at the company. EDU9: It is nine o’clock. EDU10: Thank God, I am not late for work. EDU4: It is eight o’clock when I leave home. EDU5: So late!

Traffic

How Does Memory Network Work?

slide-10
SLIDE 10

EDU1: I feel hungry after wake up, EDU2: I rush into the kitchen and make my breakfast. EDU3: My breakfast is hamburger. EDU11: But the hamburger is cold, EDU12: order some take-away food is better, maybe. EDU6: I drive into the highway, EDU7: but meet a traffic jam. EDU8: Oh, I finally arrive at the company. EDU9: It is nine o’clock. EDU10: Thank God, I am not late for work. EDU4: It is eight o’clock when I leave home. EDU5: So late!

Slot1 Slot2 Slot3 Slotn-2 Slotn-1 Slotn

···

Memory Network

How Does Memory Network Work?

slide-11
SLIDE 11

Framework

Transition-based dependency parsing Arc-eager algorithm (Nivre):

Left-Arc(LA) Right-Arc(RA) Shift Reduce Stack, Buffer, Arcs set

slide-12
SLIDE 12

Framework

Transition-based dependency parsing Arc-eager algorithm (Nivre):

Left-Arc(LA) Right-Arc(RA) Shift Reduce Stack, Buffer, Arcs set

slide-13
SLIDE 13

Framework

Transition-based dependency parsing Arc-eager algorithm (Nivre):

Left-Arc(LA) Right-Arc(RA) Shift Reduce Stack, Buffer, Arcs set

slide-14
SLIDE 14

Framework

Transition-based dependency parsing Arc-eager algorithm (Nivre):

Left-Arc(LA) Right-Arc(RA) Shift Reduce Stack, Buffer, Arcs set

slide-15
SLIDE 15

Framework

Transition-based dependency parsing Arc-eager algorithm (Nivre):

Left-Arc(LA) Right-Arc(RA) Shift Reduce Stack, Buffer, Arcs set

slide-16
SLIDE 16

Arc-eager

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

slide-17
SLIDE 17

Arc-eager

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Transition Stack [] Buffer [E1, E2, E3, E4, ···] E1 E2 E3 E4

···

E1 E2 E3 E4

···

slide-18
SLIDE 18

Arc-eager

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Transition Shift Stack [] [E1] Buffer [E1, E2, E3, E4, ···] [E2, E3, E4, ···] E1 E2 E3 E4

···

E1 E2 E3 E4

···

slide-19
SLIDE 19

Arc-eager

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Transition Shift LA(Attribution) Stack [] [E1] [] Buffer [E1, E2, E3, E4, ···] [E2, E3, E4, ···] [E2, E3, E4, ···] E1 E2 E3 E4

···

E1 E2 E3 E4

···

Attribution

slide-20
SLIDE 20

Arc-eager

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Transition Shift LA(Attribution) SH Stack [] [E1] [] [E2] Buffer [E1, E2, E3, E4, ···] [E2, E3, E4, ···] [E2, E3, E4, ···] [E3, E4, ···] E1 E2 E3 E4

···

E1 E2 E3 E4

···

Attribution

slide-21
SLIDE 21

Arc-eager

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Transition Shift LA(Attribution) SH RA(Elaboration) Stack [] [E1] [] [E2] [E2, E3] Buffer [E1, E2, E3, E4, ···] [E2, E3, E4, ···] [E2, E3, E4, ···] [E3, E4, ···] [E4, ···] E1 E2 E3 E4

···

E1 E2 E3 E4

···

Attribution Elaboration

slide-22
SLIDE 22

Arc-eager

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Transition Shift LA(Attribution) SH RA(Elaboration) RA(Joint) Stack [] [E1] [] [E2] [E2, E3] [E2, E3, E4] Buffer [E1, E2, E3, E4, ···] [E2, E3, E4, ···] [E2, E3, E4, ···] [E3, E4, ···] [E4, ···] [···] E1 E2 E3 E4

···

E1 E2 E3 E4

···

Attribution Elaboration Joint

slide-23
SLIDE 23

Arc-eager

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Transition Shift LA(Attribution) SH RA(Elaboration) RA(Joint) Stack [] [E1] [] [E2] [E2, E3] [E2, E3, E4] Buffer [E1, E2, E3, E4, ···] [E2, E3, E4, ···] [E2, E3, E4, ···] [E3, E4, ···] [E4, ···] [···]

··· ··· ···

E1 E2 E3 E4

···

E1 E2 E3 E4

···

Attribution Elaboration Joint

slide-24
SLIDE 24

Arc-eager

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Transition Shift LA(Attribution) SH RA(Elaboration) RA(Joint) Stack [] [E1] [] [E2] [E2, E3] [E2, E3, E4] Buffer [E1, E2, E3, E4, ···] [E2, E3, E4, ···] [E2, E3, E4, ···] [E3, E4, ···] [E4, ···] [···]

··· ··· ···

E1 E2 E3 E4

···

E1 E2 E3 E4

···

Attribution Elaboration Joint

slide-25
SLIDE 25

Model Overview

l1 l2 Pt ReLU FC1(ReLU) FC2(ReLU) RA(Li) SH ...

Position2

match weighted sum

SRefined

EDU

S wi sloti ...

}

Memory network1 A

RA(Li) SH

match weighted sum

BRefined

EDU1 EDU2

B wi slotj

{

... Memory network2

SRefined Position2 A BRefined

time t transition state

slide-26
SLIDE 26

Model Overview

time t transition state

State Representation

l1 l2 Pt ReLU FC1(ReLU) FC2(ReLU) RA(Li) SH ...

Position2

match weighted sum

SRefined

EDU

S wi sloti ...

}

Memory network1 A

RA(Li) SH

match weighted sum

BRefined

EDU1 EDU2

B wi slotj

{

... Memory network2

SRefined Position2 A BRefined

slide-27
SLIDE 27

Model Overview

time t transition state

Transition(action-relation) distributions State Representation

l1 l2 Pt ReLU FC1(ReLU) FC2(ReLU) RA(Li) SH ...

Position2

match weighted sum

SRefined

EDU

S wi sloti ...

}

Memory network1 A

RA(Li) SH

match weighted sum

BRefined

EDU1 EDU2

B wi slotj

{

... Memory network2

SRefined Position2 A BRefined

slide-28
SLIDE 28

BRefined

match

BRefined

EDU1 EDU2

B Position1 POS

Bi-LSTM

Attention

Word

Bi-LSTM

Attention

VWord VPOS VPosition1

weighted sum

VB BCoh

slotj

{

... ... wi Memory network2

slide-29
SLIDE 29

BRefined

match

BRefined

EDU1 EDU2

B Position1 POS

Bi-LSTM

Attention

Word

Bi-LSTM

Attention

VWord VPOS VPosition1

weighted sum

VB BCoh

slotj

{

... ... wi Memory network2

EDU basic representation

slide-30
SLIDE 30

BRefined

match

BRefined

EDU1 EDU2

B Position1 POS

Bi-LSTM

Attention

Word

Bi-LSTM

Attention

VWord VPOS VPosition1

weighted sum

VB BCoh

slotj

{

... ... wi Memory network2

EDU basic representation Position in the sentence, paragraph and discourse

slide-31
SLIDE 31

BRefined

match

BRefined

EDU1 EDU2

B Position1 POS

Bi-LSTM

Attention

Word

Bi-LSTM

Attention

VWord VPOS VPosition1

weighted sum

VB BCoh

slotj

{

... ... wi Memory network2

EDU basic representation Position in the sentence, paragraph and discourse

slide-32
SLIDE 32

SRefined

EDU basic representation Position in the sentence, paragraph and discourse

match

SRefined

EDU1

S Position1 POS

Bi-LSTM

Attention

Word

Bi-LSTM

Attention

VWord VPOS VPosition1

weighted sum

Vs SCoh

slotj

{

... ... wi Memory network1

slide-33
SLIDE 33

A and Position2

Top three transition information

SH

A

RA(Li) SH

Concatenate every transition’s embedding

slide-34
SLIDE 34

A and Position2

Top three transition information

SH

A

RA(Li) SH

Concatenate every transition’s embedding

Position2

The spatial relationship between the top EDUs of S and B

  • Same sentence
  • Same paragraph
  • Distance in paragraph
slide-35
SLIDE 35

EDU1: President Bush insists EDU2: it would be a great tool EDU3: for curbing the budget deficit EDU4: and slicing the lard out of government programs. EDU5: He wants it now .

···

EDU32: Mr. Bush is considering simply declaring EDU33: that the Constitution gives him the power

···

Root

EDU33 EDU2 EDU32 EDU1 EDU3

···

Attribution Background Elaboration Attribution Root

Overall Process

Transitions Sequence:

Shift, LA-attribution, SH, RA-elaboration , RA-joint, ···

l1 l2 Pt ReLU FC1(ReLU) FC2(ReLU) RA(Li) SH ...

Position2

match weighted sum

SRefined

EDU

S wi sloti ...

}

Memory network1 A

RA(Li) SH

match weighted sum

BRefined

EDU1 EDU2

B wi slotj

{

... Memory network2

SRefined Position2 A BRefined

slide-36
SLIDE 36

Experiment

Dataset:

RST Discourse Treebank

  • 380 discourses

– 312 training, 30 validation, 38 testing

  • 111 relation types for fine-grained
  • 19 relation types for coarse-grained
slide-37
SLIDE 37

Experiment

Dataset:

RST Discourse Treebank

  • 380 discourses

– 312 training, 30 validation, 38 testing

  • 111 relation types for fine-grained
  • 19 relation types for coarse-grained

Evaluation metrics:

  • UAS, LAS
slide-38
SLIDE 38

Experiment(Cont.)

Method UAS LAS(Fine) LAS(Coarse) Perceptron 0.5422 0.3231 0.3777 Basic(word+POS) 0.5588 0.367 0.3985 Basic(word+POS+position) 0.5933 0.3832 0.4305 Main-full 0.6197 0.3947 0.4445 MST-full 0.7331 0.4309 0.4851 Position features provide useful structural clues to our parser

slide-39
SLIDE 39

Experiment(Cont.)

Method UAS LAS(Fine) LAS(Coarse) Perceptron 0.5422 0.3231 0.3777 Basic(word+POS) 0.5588 0.367 0.3985 Basic(word+POS+position) 0.5933 0.3832 0.4305 Main-full 0.6197 0.3947 0.4445 MST-full 0.7331 0.4309 0.4851 Memory Network could model the discourse cohesion info such as lexical chains, topical infos so as to provide clues to our parser.

slide-40
SLIDE 40

Experiment(Cont.)

Method UAS LAS(Fine) LAS(Coarse) Perceptron 0.5422 0.3231 0.3777 Basic(word+POS) 0.5588 0.367 0.3985 Basic(word+POS+position) 0.5933 0.3832 0.4305 Main-full 0.6197 0.3947 0.4445 MST-full 0.7331 0.4309 0.4851 MST-full (graph-based) can directly analyze the relationship between any EDU pairs

slide-41
SLIDE 41

Conclusions & Future work

We propose to utilize memory networks to model discourse cohesion automatically.

  • Capture the topic change or lexical chains within a discourse

Conclusions:

slide-42
SLIDE 42

Conclusions & Future work

We propose to utilize memory networks to model discourse cohesion automatically.

  • Capture the topic change or lexical chains within a discourse

Improve the discourse parsing performance

Conclusions:

slide-43
SLIDE 43

Conclusions & Future work

We propose to utilize memory networks to model discourse cohesion automatically.

  • Capture the topic change or lexical chains within a discourse

Improve the discourse parsing performance

Conclusions: Future work:

Apply our method on the graph-based parsing system Optimize memory network structure

slide-44
SLIDE 44

Thanks