Computational Models of Discourse: Discourse Parsing Caroline - - PowerPoint PPT Presentation

computational models of discourse discourse parsing
SMART_READER_LITE
LIVE PREVIEW

Computational Models of Discourse: Discourse Parsing Caroline - - PowerPoint PPT Presentation

Computational Models of Discourse: Discourse Parsing Caroline Sporleder Universit at des Saarlandes Sommersemester 2009 24.06.2009 Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse Roadmap Four interdependent


slide-1
SLIDE 1

Computational Models of Discourse: Discourse Parsing

Caroline Sporleder

Universit¨ at des Saarlandes

Sommersemester 2009 24.06.2009

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-2
SLIDE 2

Roadmap

Four interdependent aspects/dimensions of discourse structure: Linguistic Structure: linguistic manifestation of discourse structure, e.g., lexical cohesion, discourse connectives/cue words, intonation, gesture, referring expressions etc. Intentional Structure: each discourse segment fulfils a purpose (why does a speaker/write make a given utterance in a given form?) Informational Structure: how do the different segments of a discourse relate to each other (which discourse relations hold)? Focus/Attentional Structure: which entities are salient at a given point in discourse?

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-3
SLIDE 3

Roadmap

We’ve addressed sofar . . . Linguistic Structure:

lexical chains word distributions for text segmentation from a generation perspective: generating referring expressions co-reference resolution

Focus/Attentional Structure:

Centering Theory

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-4
SLIDE 4

Now . . .

Informational Structure (and a bit Intentional Structure) temporal ordering (last week) Rhetorical Structure Theory discourse parsing

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-5
SLIDE 5

Modelling Discourse Structure

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-6
SLIDE 6

Modelling Discourse Structure

Various Discourse Theories: Discourse Structure Theory (DST) (Grosz & Sidner, 1986) Rhetorical Structure Theory (RST) (Mann & Thompson, 1987) Discourse Representation Theory (DRT) (Kamp & Reyle, 1993) Segmented Discourse Representation Theory (SDRT) (Asher & Lascarides 2003) What these discourse theories share: model how different segments of a discourse relate to each

  • ther (informational structure)

assume hierarchical discourse structure more or less pre-defined inventory of discourse relations

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-7
SLIDE 7

Modelling Discourse Structure

Various Discourse Theories: Discourse Structure Theory (DST) (Grosz & Sidner, 1986) Rhetorical Structure Theory (RST) (Mann & Thompson, 1987) Discourse Representation Theory (DRT) (Kamp & Reyle, 1993) Segmented Discourse Representation Theory (SDRT) (Asher & Lascarides 2003) What these discourse theories share: model how different segments of a discourse relate to each

  • ther (informational structure)

assume hierarchical discourse structure more or less pre-defined inventory of discourse relations

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-8
SLIDE 8

Rhetorical Structure Theory (RST) (Mann & Thompson, 1987)

Origins:

  • riginally developed for text generation

aim: framework for structural description of the meaning of a given text RST-Analysis: what was the intention of the writer (according to the interpretation of the analyst)?

exact intention of the writer is not always clear ⇒ possibility of several analyses for a given text

RST website: http://www.sfu.ca/rst/

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-9
SLIDE 9

Rhetorical Structure Theory (RST) (Mann & Thompson, 1987)

Elements of RST: elementary discourse units (EDUs) (usually clauses) edus and higher-level discourse segments linked by a pre-defined set of 24-30 rhetorical relations discourse segments function as nucleus (N - more important) and satellite (S - less important) most relations are binary and mono-nuclear: N+S or S+N some multi-nuclear (e.g. contrast) and non-binary relations (e.g. joint)

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-10
SLIDE 10

Example: Nucleus vs. Satellite

Nora sleeps a lot because she is ill.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-11
SLIDE 11

Example: Nucleus vs. Satellite

Nora sleeps a lot because she is ill. [ Nora sleeps a lot ]N [ because she is ill. ]S

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-12
SLIDE 12

Example: Nucleus vs. Satellite

Nora sleeps a lot because she is ill. [ Nora sleeps a lot ]N [ because she is ill. ]S Tom is going to the theatre, not to the cinema.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-13
SLIDE 13

Example: Nucleus vs. Satellite

Nora sleeps a lot because she is ill. [ Nora sleeps a lot ]N [ because she is ill. ]S Tom is going to the theatre, not to the cinema. [ Tom is going to the theatre, ]N [ not to the cinema. ]S

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-14
SLIDE 14

Example: Nucleus vs. Satellite

Nora sleeps a lot because she is ill. [ Nora sleeps a lot ]N [ because she is ill. ]S Tom is going to the theatre, not to the cinema. [ Tom is going to the theatre, ]N [ not to the cinema. ]S Today the wheather was nice, it didn’t rain.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-15
SLIDE 15

Example: Nucleus vs. Satellite

Nora sleeps a lot because she is ill. [ Nora sleeps a lot ]N [ because she is ill. ]S Tom is going to the theatre, not to the cinema. [ Tom is going to the theatre, ]N [ not to the cinema. ]S Today the wheather was nice, it didn’t rain. [ Today the wheather was nice, ]N [ it didn’t rain. ]S

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-16
SLIDE 16

Example: Rhetorical Relations

[ Nora sleeps a lot ]N [ because she is ill. ]S

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-17
SLIDE 17

Example: Rhetorical Relations

[ Nora sleeps a lot ]N [ because she is ill. ]S ⇒ Explanation

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-18
SLIDE 18

Example: Rhetorical Relations

[ Nora sleeps a lot ]N [ because she is ill. ]S ⇒ Explanation [ Tom is going to the theatre, ]N [ not to the cinema. ]S

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-19
SLIDE 19

Example: Rhetorical Relations

[ Nora sleeps a lot ]N [ because she is ill. ]S ⇒ Explanation [ Tom is going to the theatre, ]N [ not to the cinema. ]S ⇒ Antithesis

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-20
SLIDE 20

Example: Rhetorical Relations

[ Nora sleeps a lot ]N [ because she is ill. ]S ⇒ Explanation [ Tom is going to the theatre, ]N [ not to the cinema. ]S ⇒ Antithesis [ Today the wheather was nice, ]N [ it didn’t rain. ]S

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-21
SLIDE 21

Example: Rhetorical Relations

[ Nora sleeps a lot ]N [ because she is ill. ]S ⇒ Explanation [ Tom is going to the theatre, ]N [ not to the cinema. ]S ⇒ Antithesis [ Today the wheather was nice, ]N [ it didn’t rain. ]S ⇒ Elaboration

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-22
SLIDE 22

Definitions of Discourse Relations

Example: Evidence [ This tax calculation software really works. ]N [ I entered all the figures from my tax return and got a result which agreed with my hand calculations to the penny. ]S relation name: evidence constraints on N: Reader (R) might not believe N to a degree satisfactory to Writer (W) constraints on S: R believes S or finds it credible constraints on N+S: R’s comprehending S increases R’s belief of N effect: R’s belief of N is increased locus of effect: N

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-23
SLIDE 23

Definitions of Discourse Relations

Antithesis (mono-nuclear) [ Peter went to the theatre, ]N [ not the cinema ]S. constraints on N: W has positive attitude to N constraints on N+S: situations are contrasted effect: R’s positive attitude to N is enhanced Contrast (multi-nuclear) [ Peter likes chocolate, ]N [ Mary likes crisps. ]N constraints: situations described by nuclei are contrasted, both nuclei have equal weight effect: R understands the similarities and differences between both situations

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-24
SLIDE 24

Simple Example

Peter failed the exam because he didn’t study hard enough. the holidays preparing for the re−sit while his friends enjoyed themselves at the beach Result He had to spend Explanation Contrast

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-25
SLIDE 25

A More Complex Example

Raw Text Famington Police had to help control traffic recently when hundreds of people lined up to be among the first applying for jobs at the yet-to-open Mariott Hotel. The hotel’s “help-wanted” announcement for 300 openings was a rare opportunity for many

  • unemployed. The people waiting in line carried a message, a

refutation, of the claims that the jobless could be employed if only they showed enough ambition. Every rule has exceptions but the tragic and too common tableaux

  • f hundreds or even thousands of people snake-lining up for any

task with a paycheck illustrates a lack of jobs, not laziness.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-26
SLIDE 26

A More Complex Example

Segmentation into EDUs

1 Famington Police had to help control traffic recently 2 when hundreds of people lined up to be among the first

applying for jobs at the yet-to-open Mariott Hotel.

3 The hotel’s “help-wanted” announcement for 300 openings

was a rare opportunity for many unemployed.

4 The people waiting in line carried a message, a refutation, of

the claims that the jobless could be employed if only they showed enough ambition.

5 Every rule has exceptions 6 but the tragic and too common tableaux of hundreds or even

thousands of people snake-lining up for any task with a paycheck illustrates a lack of jobs,

7 not laziness. Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-27
SLIDE 27

A More Complex Example

but the tragic and too−common tableaux of hundreds or even thousands

  • f people

snake−lining up for any task with a paycheck illustrates a lack

  • f jobs,

Every rule has exceptions. The people waiting in line carried a message, a refutation, of the claims that the jobless could be employed if only they showed enough ambition. The hotel’s help−wanted announcement for 300 openings was a rare

  • pportunity for

many unemployed when hundreds of people lined up to be among the first applying for jobs at the yet−to−open Mariott Hotel. Famington police had to help control traffic recently not laziness. Antithesis Concession Evidence Circumstance Volitional Result Background Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-28
SLIDE 28

Further Relations. . .

Here is a selection . . . Antithesis, Background, Circumstance, Concession, Condition, Elaboration, Enablement, Evalutaion, Evidence, Interpretation, Justify, Motivation, Non-volitional Cause, Non-volitional Result, Otherwise, Purpose, Preparation, Restatement, Solutionhood, Summary, Volitional Cause, Volitional Result, Contrast, Joint, List, Sequence, Unless, Conjunction, Disjunction, Multinuclear Restatement, Means, Unconditional, Topic Shift, Topic Drift

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-29
SLIDE 29

Further Relations. . .

Here is a selection . . . Antithesis, Background, Circumstance, Concession, Condition, Elaboration, Enablement, Evalutaion, Evidence, Interpretation, Justify, Motivation, Non-volitional Cause, Non-volitional Result, Otherwise, Purpose, Preparation, Restatement, Solutionhood, Summary, Volitional Cause, Volitional Result, Contrast, Joint, List, Sequence, Unless, Conjunction, Disjunction, Multinuclear Restatement, Means, Unconditional, Topic Shift, Topic Drift Imagine having to annotate a lengthy text with these relations to form one complete tree!

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-30
SLIDE 30

Further Relations. . .

Here is a selection . . . Antithesis, Background, Circumstance, Concession, Condition, Elaboration, Enablement, Evalutaion, Evidence, Interpretation, Justify, Motivation, Non-volitional Cause, Non-volitional Result, Otherwise, Purpose, Preparation, Restatement, Solutionhood, Summary, Volitional Cause, Volitional Result, Contrast, Joint, List, Sequence, Unless, Conjunction, Disjunction, Multinuclear Restatement, Means, Unconditional, Topic Shift, Topic Drift Imagine having to annotate a lengthy text with these relations to form one complete tree! RST wasn’t designed for automatic annotation of discourse structure (and even humans find it challenging).

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-31
SLIDE 31

Discourse Parsing

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-32
SLIDE 32

Discourse Parsing

The task of automatically constructing the discourse tree for an input text is called discourse parsing.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-33
SLIDE 33

Discourse Parsing

The task of automatically constructing the discourse tree for an input text is called discourse parsing. Discourse Parsing is useful for . . . Question-Answering (What was the result of X? Why did Y happen?) Information Extraction Text-to-Text Applications (Summarisation, Paraphrasing) Recognising Textual Entailment Modelling/Evaluating Text Coherence . . .

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-34
SLIDE 34

Discourse Parsing

Three interdependent sub-tasks:

1 identify elementary discourse units (EDUs) 2 determine which discourse segments are related

(tree-structure)

3 determine how they are related (discourse/rhetorical relations) Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-35
SLIDE 35

How would you go about designing a system for discourse parsing?

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-36
SLIDE 36

Cue Phrases

Peter failed the exam because he didn’t study hard enough. the holidays preparing for the re−sit while his friends enjoyed themselves at the beach He had to spend explanation contrast result

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-37
SLIDE 37

Cue Phrases

Peter failed the exam study hard enough. the holidays preparing for the re−sit while his friends enjoyed themselves at the beach He had to spend explanation contrast result he didn’t because

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-38
SLIDE 38

Cue Phrases

Peter failed the exam because he didn’t study hard enough. the holidays preparing for the re−sit enjoyed themselves at the beach He had to spend explanation contrast result his friends while

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-39
SLIDE 39

Cue Phrases

Peter failed the exam because he didn’t study hard enough. the holidays preparing for the re−sit while his friends enjoyed themselves at the beach He had to spend explanation contrast result ?

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-40
SLIDE 40

Cue Phrases

discourse relations sometimes signalled by discourse cue phrases Example China imported 10.8 million tonnes of steel, despite the fact that it had yet to use up last year’s imports. ⇒ Contrast

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-41
SLIDE 41

Cue Phrases

discourse relations sometimes signalled by discourse cue phrases but these can be ambiguous

between two discourse relations

Example

1 She has worked in retail since she moved to Britain.

⇒ Temporal

2 I don’t believe he’s here since his car isn’t parked outside.

⇒ Explanation

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-42
SLIDE 42

Cue Phrases

discourse relations sometimes signalled by discourse cue phrases but these can be ambiguous

between two discourse relations between discourse and non-discourse usage

Example

1 Science has some definite conclusions about this. Yet, there

are still many things we don’t know. ⇒ Contrast

2 While there have been plans to extend the airport, nothing

has been decided yet. . ⇒ no discourse function (“yet” is used as an adverb)

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-43
SLIDE 43

Cue Phrases

discourse relations sometimes signalled by discourse cue phrases but these can be ambiguous

between two discourse relations between discourse and non-discourse usage

and relations not always explicitly signalled Example The train hit a car on an unmanned level crossing. It derailed. ⇒ Result

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-44
SLIDE 44

Cue Phrases

discourse relations sometimes signalled by discourse cue phrases but these can be ambiguous

between two discourse relations between discourse and non-discourse usage

⇒ need to be able to identify the relation even if no cue phrase is present

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-45
SLIDE 45

Discourse Parsing

History

  • riginally mainly rule-based systems (manually specified rules)

since late 1990s: creation of corpora which are annotated with discourse structure (RST Discourse Treebank; Penn Discourse Treebank; Potsdam Commentary Corpus) ⇒ machine learning of discourse structure

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-46
SLIDE 46

Rule-based Approaches (1)

Knowledge-Rich Models: Hobbs et al. 1993, Kamp & Reyle 1993, Asher & Lascarides 2003 logic-based explicit representation of world knowledge in knowledge base discourse meaning as an extension of sentence meaning (i.e., the aim is to find the best logical form)

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-47
SLIDE 47

Rule-based Approaches (2)

Knowledge-Poor Models: Marcu (1997), Polanyi et al. (2004), Corston-Oliver (1998), Le Than et al. (2004) Input: syntactically analysed texts heuristics to compute discourse structure no extensive semantic knowledge (no knowledge base) surface form (syntactic structure, deixis, anaphor, cue words etc.) provides cues for discourse structure

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-48
SLIDE 48

Corpus-based Systems

Marcu (1999), Baldridge & Lascarides (2005) etc.: supervised machine learning training data: e.g. RST Discourse Treebank discourse parsing analogous to syntactic parsing

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-49
SLIDE 49

Rule-based Discourse Parsing: Marcu (1997)

Identification of EDUs heuristics based on cue phrase clause boundary detection Determining discourse relations

1 identify cue phrases/markers 2 compute set of permissible relations for each marker (use

heuristics based on max. distance from satellite etc.) ⇒ set of disjunctive hypotheses

3 use word co-occurrence information for unmarked examples

predict elaboration if there is a lot of overlap predict topic-shift otherwise

⇒ set of disjunctive hypotheses

4 find valid structure by applying well-formedness constraints

(e.g. right frontier constraint)

5 weight remaining tree structures by preferring right-branching

trees

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-50
SLIDE 50

Dealing with unmarked relations

Rule-based and corpus trained systems work reasonably well for marked relations. On unmarked relations we are more or less guessing. ⇒ if we had large quantities of annotated unmarked examples we could just train a system specifically for these difficult cases!

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-51
SLIDE 51

Dealing with unmarked relations

How do we get annotated unmarked examples? just go ahead and label texts ⇒ time-consuming and we would end up annotating many easy (=marked) examples as well active learning (Nomoto and Matsumoto 1999) automatic labelling of training data (Marcu and Echihabi 2002)

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-52
SLIDE 52

Dealing with unmarked relations

How do we get annotated unmarked examples? just go ahead and label texts ⇒ time-consuming and we would end up annotating many easy (=marked) examples as well active learning (Nomoto and Matsumoto 1999) automatic labelling of training data (Marcu and Echihabi 2002)

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-53
SLIDE 53

Automatic Labelling of Examples

Example There was an accident on the Great Western line yesterday evening. A car had broken down on an unmanned level crossing and was hit by a high speed train. The train consequently derailed. Although the damage to the train was substantial, fortunately nobody was injured.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-54
SLIDE 54

Automatic Labelling of Examples

Example There was an accident on the Great Western line yesterday evening. A car had broken down on an unmanned level crossing and was hit by a high speed train. The train consequently derailed. Although the damage to the train was substantial, fortunately nobody was injured.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-55
SLIDE 55

Automatic Labelling of Examples

Example There was an accident on the Great Western line yesterday evening. A car had broken down on an unmanned level crossing and was hit by a high speed train. [ The train consequently derailed. ] Although the damage to the train was substantial, fortunately nobody was injured.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-56
SLIDE 56

Automatic Labelling of Examples

Example There was an accident on the Great Western line yesterday evening. [ A car had broken down on an unmanned level crossing and was hit by a high speed train. ] [ The train consequently derailed. ] Although the damage to the train was substantial, fortunately nobody was injured.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-57
SLIDE 57

Automatic Labelling of Examples

Example There was an accident on the Great Western line yesterday evening. [ A car had broken down on an unmanned level crossing and was hit by a high speed train. ] [ The train consequently derailed. ] Although the damage to the train was substantial, fortunately nobody was injured. [ A car had broken down on an unmanned level crossing and was hit by a high speed train. ] [ The train consequently derailed. ]

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-58
SLIDE 58

Automatic Labelling of Examples

Example There was an accident on the Great Western line yesterday evening. [ A car had broken down on an unmanned level crossing and was hit by a high speed train. ] [ The train consequently derailed. ] Although the damage to the train was substantial, fortunately nobody was injured. [ A car had broken down on an unmanned level crossing and was hit by a high speed train. ] [ The train derailed. ] ⇒ Result

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-59
SLIDE 59

Automatic Labelling of Examples

Example There was an accident on the Great Western line yesterday evening. A car had broken down on an unmanned level crossing and was hit by a high speed train. The train consequently derailed. Although the damage to the train was substantial, fortunately nobody was injured. [ A car had broken down on an unmanned level crossing and was hit by a high speed train. ] [ The train derailed. ] ⇒ Result

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-60
SLIDE 60

Automatic Labelling of Examples

Example There was an accident on the Great Western line yesterday evening. A car had broken down on an unmanned level crossing and was hit by a high speed train. The train consequently derailed. [ Although the damage to the train was substantial, ] fortunately nobody was injured. [ A car had broken down on an unmanned level crossing and was hit by a high speed train. ] [ The train derailed. ] ⇒ Result

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-61
SLIDE 61

Automatic Labelling of Examples

Example There was an accident on the Great Western line yesterday evening. A car had broken down on an unmanned level crossing and was hit by a high speed train. The train consequently derailed. [ Although the damage to the train was substantial, ] [ fortunately nobody was injured. ] [ A car had broken down on an unmanned level crossing and was hit by a high speed train. ] [ The train derailed. ] ⇒ Result

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-62
SLIDE 62

Automatic Labelling of Examples

Example There was an accident on the Great Western line yesterday evening. A car had broken down on an unmanned level crossing and was hit by a high speed train. The train consequently derailed. Although the damage to the train was substantial, fortunately nobody was injured. [ A car had broken down on an unmanned level crossing and was hit by a high speed train. ] [ The train derailed. ] ⇒ Result [ The damage to the train was substantial, ] [ fortunately nobody was injured. ] ⇒ Contrast

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-63
SLIDE 63

Marcu and Echihabi (2002)

Data 4 relations from RST (Mann & Thompson 1987): contrast, cause-explanation-evidence, condition, elaboration plus 2 non-relations: no-relation-same-text, no-relation-different-text 900,000 to 4 million automatically labelled examples per relation

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-64
SLIDE 64

Marcu and Echihabi (2002)

Statistical Modelling Naive Bayes word co-occurrence features choose a discourse relation rk holding between two spans W1 and W2: P(rk|W1, W2) = P(W1, W2|rk)P(rk) P(W1, W2) = P(W1, W2|rk)P(rk) P(W1, W2|rk) ≈

  • (wi,wj)∈W1×W2

P((wi, wj)|rk)

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-65
SLIDE 65

Marcu and Echihabi (2002)

Findings

1 test on automatically labelled data:

⇒ 49.7% accuracy for 6-way classifier

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-66
SLIDE 66

Marcu and Echihabi (2002)

Findings

1 test on automatically labelled data:

⇒ 49.7% accuracy for 6-way classifier

2 test on manually labelled examples from RST-DT (marked

and unmarked)

don’t remove cue phrase from training data binary classifiers ⇒ 63% to 87% accuracy

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-67
SLIDE 67

Marcu and Echihabi (2002)

Findings

1 test on automatically labelled data:

⇒ 49.7% accuracy for 6-way classifier

2 test on manually labelled examples from RST-DT (marked

and unmarked)

don’t remove cue phrase from training data binary classifiers ⇒ 63% to 87% accuracy

3 test on manually labelled, unmarked examples

binary classifiers (contrast vs. elaboration, and cause-explanation-evidence vs. elaboration) ⇒ 69.5% recall for contrast, 44.7% recall for cause-explanation-evidence

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-68
SLIDE 68

How well does this really work? (Sporleder & Lascarides, 2008)

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-69
SLIDE 69

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-70
SLIDE 70

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

⇒ redundancy between cue phrase and linguistic context

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-71
SLIDE 71

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

⇒ redundancy between cue phrase and linguistic context

result As a consequence, it derailed. result The train hit a car on a level crossing.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-72
SLIDE 72

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

⇒ redundancy between cue phrase and linguistic context

result As a consequence, it derailed. result The train hit a car on a level crossing.

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-73
SLIDE 73

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

⇒ redundancy between cue phrase and linguistic context

result The train hit a car on a level crossing. As a consequence, it derailed. result

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-74
SLIDE 74

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

⇒ train and test on automatically labelled examples

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-75
SLIDE 75

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

⇒ train and test on automatically labelled examples

2 Can the trained classifiers be successfully applied to examples

that are not unambiguously marked?

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-76
SLIDE 76

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

⇒ train and test on automatically labelled examples

2 Can the trained classifiers be successfully applied to examples

that are not unambiguously marked?

The train hit a car on a level crossing. As a consequence, it derailed. By then the crops were housed in ricks; the barns were small. similar result result

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-77
SLIDE 77

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

⇒ train and test on automatically labelled examples

2 Can the trained classifiers be successfully applied to examples

that are not unambiguously marked?

The train hit a car on a level crossing. As a consequence, it derailed. By then the crops were housed in ricks; the barns were small. similar result result

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-78
SLIDE 78

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

⇒ train and test on automatically labelled examples

2 Can the trained classifiers be successfully applied to examples

that are not unambiguously marked?

⇒ train on automatically labelled examples, test on non-unambiguously marked, manually labelled ones

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-79
SLIDE 79

How well does this really work? (Sporleder & Lascarides, 2008)

1 Can discourse relations be learned from automatically labelled

examples?

⇒ train and test on automatically labelled examples

2 Can the trained classifiers be successfully applied to examples

that are not unambiguously marked?

⇒ train on automatically labelled examples, test on non-unambiguously marked, manually labelled ones

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-80
SLIDE 80

Data

Automatic data extraction 5 relations (from SDRT inventory): contrast, explanation, result, summary, continuation examples extracted from 3 corpora (>2.1 billion words): BNC, North American News Text Corpus, English Gigaword Corpus Number of extracted examples 6.7 mil. contrast 1.5 mil. explanation 17,000 summary 15,000 result 8,500 continuation

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-81
SLIDE 81

Data

Automatic data extraction 5 relations (from SDRT inventory): contrast, explanation, result, summary, continuation examples extracted from 3 corpora (>2.1 billion words): BNC, North American News Text Corpus, English Gigaword Corpus Number of extracted examples 6.7 mil. contrast 1.5 mil. explanation 17,000 summary 15,000 result 8,500 continuation

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-82
SLIDE 82

Data

Manually labelled data extracted from RST-DT and manually mapped to SDRT relations

  • nly examples that are not unambiguously marked

1,051 examples

213 Contrast 268 Explanation 260 Continuation 266 Result 44 Summary

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-83
SLIDE 83

The Models

Model 1: Marcu & Echihabi (2002) Model 2: Sporleder & Lascarides (2005) BoosTexter (Schapire & Singer 2000) 41 linguistically motivated features:

positional features length features lexical features part-of-speech tags verb features (tense, modality, aspect, voice, negation) cohesion features (pronouns, ellipses)

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-84
SLIDE 84

The Models

Model 1: Marcu & Echihabi (2002) Model 2: Sporleder & Lascarides (2005) BoosTexter (Schapire & Singer 2000) 41 linguistically motivated features:

positional features length features lexical features part-of-speech tags verb features (tense, modality, aspect, voice, negation) cohesion features (pronouns, ellipses)

⇒ two classifiers which are different with respect to machine learning framework and feature space

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-85
SLIDE 85

Can discourse relations be learned from automatically labelled examples?

train and test on automatically labelled examples 72,000 examples (down-sampled, uniform distribution) 10-fold cross-validation Naive Bayes BoosTexter Relation

  • Avg. Acc
  • Avg. F-Score
  • Avg. Acc
  • Avg. F-Score
  • rand. baseline

20.00 20.00 20.00 20.00 continuation n/a 34.17 n/a 54.11 result n/a 35.90 n/a 51.26 summary n/a 41.46 n/a 61.16 explanation n/a 57.05 n/a 73.05 contrast n/a 34.29 n/a 58.42 all 42.34 40.57 60.88 59.60

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-86
SLIDE 86

Can discourse relations be learned from automatically labelled examples?

train and test on automatically labelled examples 72,000 examples (down-sampled, uniform distribution) 10-fold cross-validation ⇒ Yes, discourse relations can —in principle— be learned from automatically labelled data Naive Bayes BoosTexter Relation

  • Avg. Acc
  • Avg. F-Score
  • Avg. Acc
  • Avg. F-Score
  • rand. baseline

20.00 20.00 20.00 20.00 continuation n/a 34.17 n/a 54.11 result n/a 35.90 n/a 51.26 summary n/a 41.46 n/a 61.16 explanation n/a 57.05 n/a 73.05 contrast n/a 34.29 n/a 58.42 all 42.34 40.57 60.88 59.60

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-87
SLIDE 87

Do the results carry over to unmarked test examples?

test on examples that are not unambiguously marked results averaged over 10 runs Naive Bayes BoosTexter Relation

  • Avg. Acc
  • Avg. F-Score
  • Avg. Acc
  • Avg. F-Score
  • rand. baseline

20.00 20.00 20.00 20.00 continuation n/a 37.40 n/a 26.17 result n/a 12.24 n/a 22.08 summary n/a 6.63 n/a 15.49 explanation n/a 27.97 n/a 37.30 contrast n/a 11.53 n/a 21.47 all 25.92 19.15 25.80 24.50

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-88
SLIDE 88

Do the results carry over to unmarked test examples?

test on examples that are not unambiguously marked results averaged over 10 runs ⇒ No, classifiers don’t generalise to unmarked data This behaviour seems to be independent of the classifier used! Naive Bayes BoosTexter Relation

  • Avg. Acc
  • Avg. F-Score
  • Avg. Acc
  • Avg. F-Score
  • rand. baseline

20.00 20.00 20.00 20.00 continuation n/a 37.40 n/a 26.17 result n/a 12.24 n/a 22.08 summary n/a 6.63 n/a 15.49 explanation n/a 27.97 n/a 37.30 contrast n/a 11.53 n/a 21.47 all 25.92 19.15 25.80 24.50

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-89
SLIDE 89

Possible Explanations

Unambiguously marked examples are structurally too different from unmarked ones ⇒ features which are predictive for a particular relation in marked examples may not be so in unmarked ones

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-90
SLIDE 90

Conclusion (1)

Several open issues: knowledge-based systems:

a lot of work therefore few real implementations (and only for small, well-defined domains)

  • n a large scale probably intractable (wrt to development work

and computational efficiency)

knowledge-poor, heuristic systems:

still quite a lot of work fairly good for relatively easy cases, bad coverage for unmarked relations

corpus-trained systems

not a lot annotated data available (annotation is also a lot of work!) accuracies around 60% (optimistic estimate! Note: human IAA also typically not higher than 70-75%)

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse

slide-91
SLIDE 91

Conclusion (2)

Are we barking up the wrong tree? full discourse parsing not necessary for many applications representing discourse as rigid tree structures may be infelicitous (humans have problems with this!) maybe we need something much more flexible

Caroline Sporleder csporled@coli.uni-sb.de Computational Models of Discourse