Semantische Technologien (M-TANI) Christian Chiarcos Angewandte - - PowerPoint PPT Presentation

semantische technologien
SMART_READER_LITE
LIVE PREVIEW

Semantische Technologien (M-TANI) Christian Chiarcos Angewandte - - PowerPoint PPT Presentation

Aktuelle Themen der Angewandten Informatik Semantische Technologien (M-TANI) Christian Chiarcos Angewandte Computerlinguistik chiarcos@informatik.uni-frankfurt.de 18. Juli 2013 Global coherence: Discourse Motivation & Theory


slide-1
SLIDE 1

Aktuelle Themen der Angewandten Informatik

Semantische Technologien

(M-TANI)

Christian Chiarcos Angewandte Computerlinguistik chiarcos@informatik.uni-frankfurt.de

  • 18. Juli 2013
slide-2
SLIDE 2

Global coherence: Discourse

  • Motivation & Theory

– Rhetorical Structure Theory

  • Building Blocks

– Discourse Segmentation: Text Tiling

  • Theory-based approaches

– Segmented Discourse Representation Theory

  • Annotation-based approaches

– Penn Discourse Treebank

  • Data-driven approaches
slide-3
SLIDE 3

Discourse Phenomena

  • Discourse

– a series of communicative acts exchanged between individuals, conducted with the goal to manipulate the interlocutors‘ state of mind

  • exchange information, establish social roles, etc.
  • Discourse phenomena

(here)

– pertaining to the relation between text and its cognitive representation

  • ignore socio-linguistic and literary aspects
slide-4
SLIDE 4

Discourse Phenomena

  • Natural languages are spoken or written as a

collection of sentences

  • In general, a sentence cannot be understood

in isolation:

– Today was Jack's birthday. Penny and Janet went to the store. They were going to get presents. Janet decided to get a kite. "Don't do that," said

  • Penny. "Jack has a kite. He will make you take it

back.”

Rohit Kate (2010)

slide-5
SLIDE 5

Discourse Phenomena

  • Global coherence: assess the logical structure
  • f an entire discourse or parts of it
  • Important tasks in processing a discourse

– Discourse Segmentation – Coherence Relations – Anaphora Resolution

  • Ideally deep understanding is needed to do

well on these tasks but so far shallow methods have been used

Rohit Kate (2010)

slide-6
SLIDE 6

Phenomena: Discourse Relations

  • Semantic, logical or other dependencies between

utterances in a discourse.

– Indicated by cue words, may be optional. John fell. Max pushed him. John fell, … … because … … after … … and, then, … … Max pushed him.

slide-7
SLIDE 7

Phenomena: Discourse Structure

(1) PeterP came home. (2) HeP had a long conversation with JohnJ. (3) To have someone to talk to helped himP/J a lot.

(1) PeterP came home. (2) HeP had a long conversation with JohnJ. (2a) MarryM arrived later in the evening. (3) To have someone to talk to helped himP/??J a lot.

slide-8
SLIDE 8

Phenomena: Discourse Structure

(1) PeterP came home. (2) HeP had a long conversation with JohnJ. (3) To have someone to talk to helped himP/J a lot.

(1) PeterP came home. (2) HeP had a long conversation with JohnJ. (2a) MarryM arrived later in the evening. (3) To have someone to talk to helped himP/??J a lot.

(1) (2) (3) (1) (2) (2a) (3)

slide-9
SLIDE 9

Meaning in context

… why should we* care ?

Sound waves Phonetics Words Syntactic processing Parses Semantic processing Meaning Discourse processing

  • This is a conceptual pipeline, humans or

computers may process multiple stages simultaneously Rohit Kate (2010) * NLP folks

slide-10
SLIDE 10

… why should we care ?

  • Anaphora Resolution and discourse relations

– John pushed Max. He fell. John pushed Max. So, he fell. [cause] – John pushed Max. He apologized. John pushed Max. But he apologized. [contrast*]

* contrast with the implicit assumption that John pushed Max by intention.

slide-11
SLIDE 11

… why should we care ?

  • Text summarization and discourse structure
  • The more deeply embedded a discourse

segment is, the less likely is it to be included in the summary. (Marcu 1997)

  • Summarization systems may skip or merge the

segment connected with Elaboration relation

  • These provide supportive information only
slide-12
SLIDE 12

… why should we care ?

Role of higher order relations: Discourse Structure provides information about the arguments to discourse connectives and thus indirectly of the relation between entities and/or the predication mentioned in those arguments. This higher order information can be the basis of a level of inference that goes beyond the level of entities and relations as they appear in individual clauses or sentences. Systems for IE, NLG, QA, and summarization either ignore connectives in a sentence or eliminate sentences containing connectives.  The approaches described here can make this higher order information available.

Joshi et al. (2006)

slide-13
SLIDE 13

… why should we care ?

  • In the absence of extraordinary gains or losses the “typical”

correlation between earnings and sales is positive, as signaled here by non-contrastive while.

  • 199.8 Sales increased 11% to $2.5 billion from $2.25 billion while
  • perating profit climbed 13% to $225.7 million from million.
  • The correlation between earnings/profits and sales can

sometimes be “atypical”, even inversely correlated, as signaled here by contrastive however.

  • Sales in North America and the Far East were inflated by acquisitions,

rising 62% to $278 million. Operating profit dropped 35%, however, to $3.8 million.

Joshi et al. (2006)

slide-14
SLIDE 14

… why should we care ?

The first argument of a connective, such as however, need not always be in the preceding sentence.

  • N.V. DSM said net income in the third quarter jumped 63% as the

company had substantially lower extraordinary charges to account for a restructuring program. (… 9 sentences …) Sales, however, were little changed at 2.46 billion guilders, compared with 2.42 billion guilders.

 Identifying arguments of discourse relations can therefore help systems for IE, NLG, QA, and summarization by providing higher order information.

Joshi et al. (2006)

slide-15
SLIDE 15

Other Applications

  • Natural Language Generation

– Input: communicative goals and semantic representation – Output: text

  • Writing research / Essay Scoring

– How are coherent texts created ?

  • Question Answering

– Search in segments with Explanation relations

  • Semantic Parsing / Machine Reading

– a larger meaning representation of the whole discourse

Taboada & Stede (2009), Rohit Kate (2010)

slide-16
SLIDE 16

Discourse Structure

  • The hierarchical structure of a discourse

according to the coherence relations.

  • Analogous to syntactic tree structure
  • A node in a tree represents locally coherent

sentences: discourse unit (not linear)

Occasion Explanation Parallel Explanation

John went to the bank to deposit his paycheck. He then took a train to Bill’s car dealership. He needed to buy a car. The company he works for now isn’t near any public transportation. He also wanted to talk to Bil l about their softball league.

Rohit Kate (2010)

slide-17
SLIDE 17

Discourse Structure

  • Several competing approaches

– Grosz & Sidner‘s (1986) Stack Model

  • texts have a hierarchical structure with coordination

and subordination => tree

– Rhetorical Structure Theory (RST)

  • discourse structure is represented as a tree
  • different types of relations distinguished

– Segmented Discourse Representation Theory

  • grounded in formal (dynamic) semantics
  • combines an older model of quantifier scope (Discourse

Representation Theory, SDRT) with RST-style discourse relations

slide-18
SLIDE 18

Rhetorical Structure Theory

  • Created as part of a project on Natural Language Generation at the

Information Sciences Institute (www.isi.edu)

  • Central publication

– Mann, William C. and Sandra A. Thompson. (1988). Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8 (3), 243-281.

  • Recent overview

– Taboada, Maite and William C. Mann. (2006). Rhetorical Structure Theory: Looking back and moving ahead. Discourse Studies, 8 (3), 423-459.

  • For many more publications and applications, visit the bibliography on the

RST web site

– http://www.sfu.ca/rst/ – http://www.sfu.ca/rst/05bibliographies/ Taboada & Stede (2009)

slide-19
SLIDE 19

RST Principles

  • Coherent texts consist of minimal units, which

are linked to each other, recursively, through rhetorical relations

– Rhetorical relations also known, in other theories, as coherence or discourse relations

  • Coherent texts do not show gaps or non-

sequiturs

– Therefore, there must be some relation holding among the different parts of the text

Taboada & Stede (2009)

slide-20
SLIDE 20

RST Components

  • Units of discourse

– Texts can be segmented into minimal units, or spans

  • Nuclearity

– Some spans are more central to the text’s purpose (nuclei), whereas others are secondary (satellites) – Based on hypotactic and paratactic relations in language

  • Relations among spans

– Spans are joined into discourse relations

  • Hierarchy/recursion

– Spans that are in a discourse relation may enter into new relations

Taboada & Stede (2009)

slide-21
SLIDE 21

Coordination in RST

  • At the sub-sentential level (traditional coordinated

clauses)

  • Peel oranges, and slice crosswise.
  • But also across sentences
  • 1. Peel oranges, 2. and slice crosswise. 3. Arrange in a

bowl 4. and sprinkle with rum and coconut. 5. Chill until ready to serve.

Taboada & Stede (2009)

slide-22
SLIDE 22

Subordination in RST

  • Sub-sentential Concession

relation

  • Concession across sentences

– Nucleus (spans 2-3) made up of two spans in an Antithesis relation Taboada & Stede (2009)

slide-23
SLIDE 23

RST Relations

  • They hold between two non-overlapping text

spans

  • Most of the relations hold between a nucleus and

a satellite, although there are also multi-nuclear relations

  • A relation consists of:
  • 1. Constraints on the Nucleus,
  • 2. Constraints on the Satellite,
  • 3. Constraints on the combination of Nucleus and

Satellite,

  • 4. The Effect.

Taboada & Stede (2009)

slide-24
SLIDE 24

How are Discourse Relations declared?

  • Broadly, there are two ways of specifying

discourse relations

  • Abstract specification

– Relations between are always inferred, and declared by choosing from a pre-defined set of abstract categories. – Lexical elements can serve as partial, ambiguous evidence for inference.

  • Lexically grounded

– Relations can be grounded in lexical elements. – Where lexical elements are absent, relations may be inferred.

Joshi et al. (2006)

slide-25
SLIDE 25

Example: Evidence

  • Constraints on the Nucleus

– The reader may not believe N to a degree satisfactory to the writer

  • Constraints on the Satellite

– The reader believes S or will find it credible

  • Constraints on the combination of N+S

– The reader’s comprehending S increases their belief of N

  • Effect (the intention of the writer)

– The reader’s belief of N is increased

  • Assuming a written text and readers and writers; extensions of RST

to spoken language discussed later

  • Definitions of most common relations are available from the RST

web site (www.sfu.ca/rst)

Taboada & Stede (2009)

slide-26
SLIDE 26

RST Relation types

  • Relations are of different types

– Subject matter: they relate the content of the text spans

  • Cause, Purpose, Condition, Summary

– Presentational: more rhetorical in nature. They are meant to achieve some effect on the reader

  • Motivation, Antithesis, Background, Evidence

Taboada & Stede (2009)

slide-27
SLIDE 27

Other possible classifications

  • Relations that hold outside the text

– Condition, Cause, Result

  • vs. those that are only internal to the text

– Summary, Elaboration

  • Relations frequently marked by a discourse marker

– Concession (although, however); Condition (if, in case)

  • vs. relations that are rarely, or never, marked

– Background, Restatement, Interpretation

  • Preferred order of spans: nucleus before satellite

– Elaboration – usually first the nucleus (material being elaborated on) and then satellite (extra information)

  • vs. satellite-nucleus

– Concession – usually the satellite (the although-type clause or span) before the nucleus

Taboada & Stede (2009)

slide-28
SLIDE 28

Relation names (in M&T 1988)

Circumstance Antithesis and Concession Solutionhood Antithesis Elaboration Concession Background Condition and Otherwise Enablement and Motivation Condition Enablement Otherwise Motivation Interpretation and Evaluation Evidence and Justify Interpretation Evidence Evaluation Justify Restatement and Summary Relations of Cause Restatement Volitional Cause Summary Non-Volitional Cause Other Relations Volitional Result Sequence Non-Volitional Result Contrast Purpose

Other classifications are possible, and longer and shorter lists have been proposed Taboada & Stede (2009)

slide-29
SLIDE 29

Graphical representation

  • A horizontal line covers a

span of text (possibly made up of further spans

  • A vertical line signals the

nucleus or nuclei

  • A curve represents a

relation, and the direction

  • f the arrow, the direction
  • f satellite towards nucleus

Taboada & Stede (2009)

slide-30
SLIDE 30

RST Resources

  • RST web page

– www.sfu.ca/rst

  • RST tool (for annotation / drawing diagrams)

– http://www.wagsoft.com/RSTTool/

Taboada & Stede (2009)

slide-31
SLIDE 31

How to do an RST analysis

  • Given a segmentation S of the text into

elementary discourse units (edus)

– edu size may vary, in RST usually clauses

  • for each u in S and any of its neighbours u’ in S

if there a clear relation r holding between u and u’

then mark that relation r else u might be at the boundary of a higher-level relation. Look at relations holding between larger units (spans)

if a relation r was created between any u1,u2 in S

then update 𝑇 → 𝑇\{𝑣1, 𝑣2} ∪ 𝑣1∘2 with the unit 𝑣1∘2 as concatenation of 𝑣1, 𝑣2

iterate until |S|=1

Taboada & Stede (2009)

slide-32
SLIDE 32

RST issues

  • Annotation is possible …

… but not very reliable, slow and expensive

  • Definitions of units

– Vary from researcher to researcher, depending on the level of granularity needed

  • Relations inventory

– conflate different aspects of meaning and impose rigid constraints (tree structure) => multiple analyses possible

  • Problems in identifying relations

– Judgments are plausibility judgments. Two analysts might differ in their analyses

  • A theory purely of intentions

Taboada & Stede (2009)

slide-33
SLIDE 33

RST issues

  • Annotation is possible …

… but not very reliable, slow and expensive

  • Definitions of units

– Vary from researcher to researcher, depending on the level of granularity needed

  • Relations inventory

– conflate different aspects of meaning and impose rigid constraints (tree structure) => multiple analyses possible

  • Problems in identifying relations

– Judgments are plausibility judgments. Two analysts might differ in their analyses

  • A theory purely of intentions

Taboada & Stede (2009)

Possible solutions include

  • alterative, more strictly formalized theories

(SDRT)

  • alternative, simplified models for annotation

(Penn Discourse Treebank, PDTB)

  • data-driven, weakly supervised approaches
slide-34
SLIDE 34

RST issues

  • Annotation is possible …

… but not very reliable, slow and expensive

  • Definitions of units

– Vary from researcher to researcher, depending on the level of granularity needed

  • Relations inventory

– conflate different aspects of meaning and impose rigid constraints (tree structure) => multiple analyses possible

  • Problems in identifying relations

– Judgments are plausibility judgments. Two analysts might differ in their analyses

  • A theory purely of intentions

Taboada & Stede (2009)

Possible solutions include

  • alterative, more strictly formalized theories

(SDRT)

  • alternative, simplified models for annotation

(Penn Discourse Treebank, PDTB)

  • data-driven, weakly supervised approaches

But first: Discourse Segmentation: How to identify the building blocks of discourse structure (annotation) ?

slide-35
SLIDE 35

Discourse Segmentation

  • Separating a document into a linear sequence of

subtopics

– For example: scientific articles are segmented into Abstract, Introduction, Methods, Results, Conclusions – This is often a simplification of a higher level structure of a discourse, e.g., building blocks for hierarchical models of discourse

  • Applications of automatic discourse segmentation:

– For Summarization: Summarize each segment separately – For Information Retrieval or Information Extraction: Apply to an appropriate segment

  • Related task: Paragraph segmentation, for example of

a speech transcript

Rohit Kate (2010)

slide-36
SLIDE 36

Unsupervised Discourse Segmentation

  • Given raw text, segment it into multiple

paragraph subtopics

  • Unsupervised: No training data is given for the

task

  • Cohesion-based approach: Segment into

subtopics in which sentences/paragraphs are cohesive with each other; A dip is cohesion at subtopic boundaries

Rohit Kate (2010)

slide-37
SLIDE 37

Cohesion

  • Cohesion: Links between text units due to

linguistic devices, i.e., similar expressions

  • Lexical Cohesion: Use of same or similar words to

link text units

– Today was Jack's birthday. Penny and Janet went to the store. They were going to get presents. Janet decided to get a kite. "Don't do that," said Penny. "Jack has a kite. He will make you take it back.”

  • Non-lexical Cohesion: For example, using the

same gesture

Rohit Kate (2010)

slide-38
SLIDE 38

Cohesion

  • Cohesion: Links between text units due to

linguistic devices, i.e., similar expressions

  • Lexical Cohesion: Use of same or similar words to

link text units

– Today was Jack's birthday. Penny and Janet went to the store. They were going to get presents. Janet decided to get a kite. "Don't do that," said Penny. "Jack has a kite. He will make you take it back.”

  • Non-lexical Cohesion: For example, using the

same gesture

Cohesion is not to be confused with coherence!

Coherence: Text units are related by meaning relations

But coherence may be indicated by cohesion

slide-39
SLIDE 39

Cohesion-based Unsupervised Discourse Segmentation

  • TextTiling algorithm (Hearst, 1997)

– compare adjacent blocks of text – look for shifts in vocabulary

  • Do pre-processing: Tokenization, remove stop

words, stemming

  • Divide text into pseudo-sentences of equal

length (say 20 words)

Rohit Kate (2010)

slide-40
SLIDE 40

Cohesion-based Unsupervised Discourse Segmentation

  • TextTiling algorithm (Hearst, 1997)

– compare adjacent blocks of text – look for shifts in vocabulary

  • Do pre-processing: Tokenization, remove stop

words, stemming

  • Divide text into pseudo-sentences of equal

length (say 20 words)

Rohit Kate (2010)

slide-41
SLIDE 41

TextTiling Algorithm contd.

  • Compute lexical cohesion score at each gap

between pseudo-sentences

  • Lexical cohesion score: Similarity of words

before and after the gap (take say 10 pseudo- sentences before and 10 pseudo-sentences after)

  • Similarity: Cosine similarity between the word

vectors (high if words co-occur)

Gap Rohit Kate (2010)

slide-42
SLIDE 42

TextTiling Algorithm contd.

  • Compute lexical cohesion score at each gap

between pseudo-sentences

  • Lexical cohesion score: Similarity of words

before and after the gap (take say 10 pseudo- sentences before and 10 pseudo-sentences after)

  • Similarity: Cosine similarity between the word

vectors (high if words co-occur)

Gap Similarity Rohit Kate (2010)

slide-43
SLIDE 43
  • Plot the similarity and compute the depth

scores of the “similarity valleys”, (a-b)+(c-b)

  • Assign segmentation if the depth score is

larger than a threshold (e.g. one standard deviation deeper than mean valley depth)

TextTiling Algorithm contd.

a b c valley Rohit Kate (2010)

slide-44
SLIDE 44
  • Plot the similarity and compute the depth

scores of the “similarity valleys”, (a-b)+(c-b)

  • Assign segmentation if the depth score is

larger than a threshold (e.g. one standard deviation deeper than mean valley depth)

TextTiling Algorithm contd.

Rohit Kate (2010)

slide-45
SLIDE 45

TextTiling Algorithm contd.

From (Hearst, 1994) Rohit Kate (2010)

slide-46
SLIDE 46

Supervised Discourse Segmentation

  • Easy to get supervised data for some

segmentation tasks

– For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition

  • utput

Rohit Kate (2010)

slide-47
SLIDE 47

Supervised Discourse Segmentation

  • Easy to get supervised data for some

segmentation tasks

– For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition

  • utput
  • Model as a classification task: Classify if the

sentence boundary is a paragraph boundary

– Use any classifier SVM, Naïve Bayes, Maximum Entropy etc.

Rohit Kate (2010)

slide-48
SLIDE 48

Supervised Discourse Segmentation

  • Easy to get supervised data for some

segmentation tasks

– For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition

  • utput
  • Model as a classification task: Classify if the

sentence boundary is a paragraph boundary

– Use any classifier SVM, Naïve Bayes, Maximum Entropy etc.

  • Or model as a sequence labeling task: Label a

sentence boundary with “paragraph boundary”

  • r “not a paragraph boundary label”

Rohit Kate (2010)

slide-49
SLIDE 49

Supervised Discourse Segmentation

  • Features:

– Use cohesion features: word overlap, word cosine similarity, anaphoras etc. – Additional features: Discourse markers or cue word

  • Discourse marker or cue phrase/word: A word or

phrase that signal discourse structure

– For example, “good evening”, “joining us now” in broadcast news – “Coming up next” at the end of a segment, “Company Incorporated” at the beginning of a segment etc. – Either hand-code or automatically determine by feature selection

Rohit Kate (2010)

slide-50
SLIDE 50

Discourse Segmentation Evaluation

  • Not a good idea to measure precision, recall

and F-measure because that won’t be sensitive to near misses => WindowDiff (Pevzner & Hearst, 2002)

– Slide a window of length k across the reference (correct) and the hypothesized segmentation and count the number of segmentation boundaries in each – WindowDiff metric: Average difference in the number of boundaries in the sliding window

Rohit Kate (2010)

slide-51
SLIDE 51

Discourse Phenomena

  • Motivation & Theory

– Rhetorical Structure Theory

  • Building Blocks

– Discourse Segmentation: Text Tiling

  • Theory-based approaches

– Segmented Discourse Representation Theory

  • Annotation-based approaches

– Penn Discourse Treebank

  • Data-driven approaches
slide-52
SLIDE 52

Theory-based Approaches

  • SDRT as an example

– Segmented Discourse Representation Theory

(Asher 1993, Asher & Lascarides 2003)

– dynamic semantics

(Discourse Representation Theory, Kamp 1982)

– extended with discourse relations

Hobbs (1978), Mann & Thompson (1987)

– hierarchical discourse structure

Polanyi (1985), Webber (1988)

slide-53
SLIDE 53

Discourse Analysis with SDRT

Max pushed John.

x y e1 n Max(x) John(y) e1: push(x,y) e1 < n

1 Discourse segment (utterance) x variable (discourse referent) for Max y variable (discourse referent) for John e1 variable (event) described by the utterance n Reference time (present) unary predicates that represent noun attributes binary predicate that reflects the semantics of the verb the event precedes the present time

parse and create segment

slide-54
SLIDE 54

Discourse Analysis with SDRT

Max pushed John.

x y e1 n Max(x) John(y) e1: push(x,y) e1 < n

1: 1

integrate segment with the (previously empty) context

slide-55
SLIDE 55

Discourse Analysis with SDRT

Max pushed John.

z e2 n e2: fall(z) e2 < n

2:

He fell.

x y e1 n Max(x) John(y) e1: push(x,y) e1 < n

1: 1

process next utterance construct new segment

slide-56
SLIDE 56

Discourse Analysis with SDRT

Max pushed John.

z e2 n e2: fall(z) e2 < n z = y

2:

He fell.

x y e1 n Max(x) John(y) e1: push(x,y) e1 < n

1: 1, 2

update with the new segment

anaphor resolution inferred discourse relations Result(1, 2) Narration(1, 2)

slide-57
SLIDE 57

Discourse Analysis with SDRT

  • SDRT accounts for

– anaphoric reference – lexical disambiguation – bridging – presupposition – ellipsis – coherence

  • but only, if discourse relations can be inferred
slide-58
SLIDE 58

Inference of Discourse Relations

SDRT defeasible (nonmonotonic) inference (Glue logic)

semantic constraints on the new segment  structural constraints on potential attachment points  semantic constraints on potential attachment point > discourse relation to be applied

> defeasible inference,  monotone inference (e.g., if a discourse connector signals the relation unambiguously)

slide-59
SLIDE 59

Inference of Discourse Relations

if

segment  can be attached to segment  in context t

and

the event described in  involves a pushing event with arguments x and y

and

the event described in  involves a falling event of argument y

then, normally,

the discourse relation between  and  is a Result

(<t,, >  [Push(e,x,y)]K  [Fall(e,y)]K) > Result(,)

slide-60
SLIDE 60

Inference of Discourse Relations

if

segment  can be attached to segment 

and

the event described in  is a pushing event with arguments x and y

and

the event described in  involves a falling event of argument y

then, normally,

the discourse relation between  and  is a Result

(<t,, >  [Push(e,x,y)]K  [Fall(e,y)]K) > Result(,)

slide-61
SLIDE 61

Inference of Discourse Relations

if

segment  can be attached to segment 

and

the event described in  is a pushing event with arguments x and y

and

the event described in  is a falling event of argument y

then, normally,

the discourse relation between  and  is a Result

(<t,, >  [Push(e,x,y)]K  [Fall(e,y)]K) > Result(,)

slide-62
SLIDE 62

Inference of Discourse Relations

if

segment  can be attached to segment 

and

the event described in  is a pushing event with arguments x and y

and

the event described in  is a falling event of argument y

then, normally,

the discourse relation between  and  is a Result

(<t,, >  [Push(e,x,y)]K  [Fall(e,y)]K) > Result(,)

slide-63
SLIDE 63

Inference of Discourse Relations

  • „GLUE logic“

– accesses

  • structural and propositional contents of the context
  • propositional contents of the new segment

– employs

  • generic pragmatic principles (e.g., Gricean)
  • specific pragmatic principles (e.g., shared world

knowledge)

  • monotonic axioms (gather discourse clues from logical

form)

  • defeasible (non-monotonic) rules (infer discourse

relations) to operationalize SDRT as it is stated, we need an

exhaustive formal model of shared knowledge

and

formally defined rules to infer every possible discourse relation, etc.

slide-64
SLIDE 64

Inference of Discourse Relations

  • „GLUE logic“

– accesses

  • structural and propositional contents of the context
  • propositional contents of the new segment

– employs

  • generic pragmatic principles (e.g., Gricean)
  • specific pragmatic principles (e.g., shared world

knowledge)

  • monotonic axioms (gather discourse clues from logical

form)

  • defeasible (non-monotonic) rules (infer discourse

relations) to operationalize SDRT as it is stated, we need an

exhaustive formal model of shared knowledge

and

formally defined rules to infer every possible discourse relation In this form, these resources are not available. State of the art:

Underspecified discourse analysis Discourse relations only for explicit cues Approximate shared knowledge with lexical- semantic resources (FrameNet, etc.) (Bos 2008)

slide-65
SLIDE 65

Boxer (Bos 2008)

  • Based on DRT, augmented with RST/SDRT-like

relations

– Manually corrected training data: Groningen Meaning Bank (http://gmb.let.rug.nl) – Demo & download: http://svn.ask.it.usyd.edu.au/trac/candc/wiki/box er – RDF wrapper: Fred (http://wit.istc.cnr.it/stlab- tools/fred/)

  • Discourse relations only where explicitly

signalled

slide-66
SLIDE 66

Fred (Boxer)

Max pushed John. He fell.

(We don‘t get the coherence relation, and the anaphor isn‘t correctly resolved.)

http://wit.istc.cnr.it/stlab-tools/fred/

slide-67
SLIDE 67

Fred (Boxer+WikiFier)

Max pushed John. He fell.

– with Named Entity Recognition plus DBpedia links

  • the latter are incorrect for the example.

http://wit.istc.cnr.it/stlab-tools/fred/

slide-68
SLIDE 68

Discourse Phenomena

  • Motivation & Theory

– Rhetorical Structure Theory

  • Building Blocks

– Discourse Segmentation: Text Tiling

  • Theory-based approaches

– Segmented Discourse Representation Theory

  • Annotation-based approaches

– Penn Discourse Treebank

  • Data-driven approaches
slide-69
SLIDE 69

Penn Discourse Treebank

  • Recently released corpus that is likely to lead

to better systems for discourse processing

  • Has coherence relations encoded associated

with the discourse connectives

  • Linked to the Penn Treebank

– http://www.seas.upenn.edu/~pdtb/

  • Goal: reliable annotation, theory-neutrality

lexical definition of discourse relations discourse relations only, no discoursen structure

Rohit Kate (2010)

slide-70
SLIDE 70

Lexical Definition of Discourse Relations

  • Discourse relations associated with “conjunctive

elements” (discourse markers)

(Halliday & Hasan 1976)

– Coordinating and subordinating conjunctions – Conjunctive adjuncts (aka discourse adjuncts), including

  • Adverbs such as but, so, next, accordingly, actually, instead, etc.
  • Prepositional phrases (PPs) such as as a result, in addition, etc.
  • PPs with that or other referential item such as in addition to that,

in spite of that, in that case, etc.

  • Each such element conveys a cohesive relation

between

– its matrix sentence and – a presupposed predication from the surrounding discourse

=> represented as a string of text in the preceding text

Joshi et al. (2006)

slide-71
SLIDE 71

No Discourse Structure, Discourse Relations only

  • Discourse relations are not associated with

discourse structure because some theories explicitly reject any notion of structure in discourse

– Whatever relation there is among the parts of a text – the sentences, the paragraphs, or turns in a dialogue – it is not the same as structure in the usual sense, the relation which links the parts of a sentence or a

  • clause. [Halliday & Hasan, 1976, p. 6]

– Between sentences, there are no structural relations. [Halliday & Hasan, 1976, p. 27]

Joshi et al. (2006)

slide-72
SLIDE 72

Corpus and Annotation Representation

  • Wall Street Journal

– 2304 articles, ~1M words – partial overlap with RST discourse treebank (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp? catalogId=LDC2002T07)

  • Annotation

– the text spans of connectives and their arguments – features encoding the semantic classification of connectives, and attribution of connectives and their arguments. – If a connective can be inferred, annotated as “implicit” connective (PTDB 2)

Joshi et al. (2006)

slide-73
SLIDE 73

Explicit Connectives

Explicit connectives are the lexical items that trigger discourse relations.

  • Subordinating conjunctions (e.g., when, because, although, etc.)
  • The federal government suspended sales of U.S. savings bonds because

Congress hasn't lifted the ceiling on government debt.

  • Coordinating conjunctions (e.g., and, or, so, nor, etc.)
  • The subject will be written into the plots of prime-time shows, and

viewers will be given a 900 number to call.

  • Discourse adverbials (e.g., then, however, as a result, etc.)
  • In the past, the socialist policies of the government strictly limited the

size of … industrial concerns to conserve resources and restrict the profits businessmen could make. As a result, industry operated out of small, expensive, highly inefficient industrial units.

  • Only 2 AO arguments, labeled Arg1 and Arg2
  • Arg2: clause with which connective is syntactically associated
  • Arg1: the other argument

Joshi et al. (2006)

slide-74
SLIDE 74

Implicit Connectives

When there is no Explicit connective present to relate adjacent sentences, it may be possible to infer a discourse relation between them due to adjacency.

  • Some have raised their cash positions to record levels.

Implicit=because (causal) High cash positions help buffer a fund when the market falls.

  • The projects already under construction will increase Las

Vegas's supply of hotel rooms by 11,795, or nearly 20%, to 75,500. Implicit=so (consequence) By a rule of thumb of 1.5 new jobs for each new hotel room, Clark County will have nearly 18,000 new jobs. Such discourse relations are annotated by inserting an “Implicit connective” that “best” captures the relation.

Joshi et al. (2006)

slide-75
SLIDE 75

Arguments

  • Arg2 is the sentence/clause with which connective is syntactically associated.
  • Arg1 is the other argument.
  • No constraints on relative order. Discontinuous annotation is allowed.
  • Linear:
  • The federal government suspended sales of U.S. savings bonds

because Congress hasn't lifted the ceiling on government debt.

  • Interposed:
  • Most oil companies, when they set exploration and production

budgets for this year, forecast revenue of $15 for each barrel of crude produced.

  • The chief culprits, he says, are big companies and business groups

that buy huge amounts of land "not for their corporate use, but for resale at huge profit." … The Ministry of Finance, as a result, has proposed a series of measures that would restrict business investment in real estate even more tightly than restrictions aimed at individuals.

Joshi et al. (2006)

slide-76
SLIDE 76

Extent of Arguments

  • arguments of connectives can be sentential, sub-sentential,

multi-clausal or multi-sentential:

  • Legal controversies in America have a way of assuming a symbolic

significance far exceeding what is involved in the particular case. They speak volumes about the state of our society at a given moment. It has always been so. Implicit=for example (exemplification) In the 1920s, a young schoolteacher, John T. Scopes, volunteered to be a guinea pig in a test case sponsored by the American Civil Liberties Union to challenge a ban on the teaching of evolution imposed by the Tennessee Legislature. The result was a world-famous trial exposing profound cultural conflicts in American life between the "smart set," whose spokesman was H.L. Mencken, and the religious fundamentalists, whom Mencken derided as benighted primitives. Few now recall the actual outcome: Scopes was convicted and fined $100, and his conviction was reversed on appeal because the fine was excessive under Tennessee law.

Joshi et al. (2006)

slide-77
SLIDE 77

Location of Arg1

  • Same sentence as Arg2:
  • The federal government suspended sales of U.S. savings bonds because

Congress hasn't lifted the ceiling on government debt.

  • Sentence immediately previous to Arg2:
  • Why do local real-estate markets overreact to regional economic cycles?

Because real-estate purchases and leases are such major long-term commitments that most companies and individuals make these decisions

  • nly when confident of future economic stability and growth.
  • Previous sentence non-contiguous to Arg2 :
  • Mr. Robinson … said Plant Genetic's success in creating genetically

engineered male steriles doesn't automatically mean it would be simple to create hybrids in all crops. That's because pollination, while easy in corn because

the carrier is wind, is more complex and involves insects as carriers in crops such as

  • cotton. "It's one thing to say you can sterilize, and another to then successfully

pollinate the plant," he said. Nevertheless, he said, he is negotiating with Plant

Genetic to acquire the technology to try breeding hybrid cotton.

Joshi et al. (2006)

slide-78
SLIDE 78

Semantic Classification for Connectives

visualized with Protégé 4.1, http://sourceforge.net/p/olia/code/45/tree/trunk/owl/experimental/discourse/PDTB.owl

slide-79
SLIDE 79

Other Relations: AltLex, EntRel, NoRel

Implicit connectives cannot be inserted between adjacent sentences if one of the following three relations is found

– AltLex, EntRel, NoRel

  • AltLex: A discourse relation is inferred, but insertion of an Implicit

connective leads to redundancy because the relation is Alternatively Lexicalized by some non-connective expression:

  • Ms. Bartlett's previous work, which earned her an international

reputation in the non-horticultural art world, often took gardens as its nominal subject. AltLex = (consequence) Mayhap this metaphorical connection made the BPC Fine Arts Committee think she had a literal green thumb.

Joshi et al. (2006)

slide-80
SLIDE 80

Non-insertability of Implicit Connectives

  • EntRel: the coherence is due to an entity-based relation.
  • Hale Milgrim, 41 years old, senior vice president, marketing at Elecktra

Entertainment Inc., was named president of Capitol Records Inc., a unit of this entertainment concern. EntRel Mr. Milgrim succeeds David Berman, who resigned last month.

  • NoRel: Neither discourse nor entity-based relation is inferred.
  • Jacobs is an international engineering and construction concern. NoRel

Total capital investment at the site could be as much as $400 million, according to Intel.

 EntRel and NoRel do not express discourse relations, hence no semantic classification is provided for them. AltLex is subcategorized like explicit and implicit connectives.

Joshi et al. (2006)

slide-81
SLIDE 81

Annotation Overview (PDTB 1.0): Explicit Connectives

  • All WSJ sections (25 sections; 2304 texts)
  • 100 distinct types
  • Subordinating conjunctions – 31 types
  • Coordinating conjunctions – 7 types
  • Discourse Adverbials – 62 types
  • 18505 distinct tokens

Joshi et al. (2006)

slide-82
SLIDE 82

Data Challenges

Data sparsity

High annotation costs and limited reliability limit the size of corpora

Annotation compatiblity

Different annotation schemes for the same phenomenon are not necessarily comparable

slide-83
SLIDE 83

Data Challenges

Limited agreement

If your classifier performs better than the annotators, agreement metrics are uninterpretable.

Limited data overlap

Dependencies between discourse phenomena can

  • nly be studied if the same primary data is used
slide-84
SLIDE 84

Data Challenges

Limited agreement

If your classifier performs better than the annotators, agreement metrics are uninterpretable.

Limited data overlap

Dependencies between discourse phenomena can

  • nly be studied if the same primary data is used

More and better data may be available if information could be preprocessed to a larger extent

slide-85
SLIDE 85

Data-driven Approaches

  • Motivation & Theory

– Rhetorical Structure Theory

  • Building Blocks

– Discourse Segmentation: Text Tiling

  • Theory-based approaches

– Segmented Discourse Representation Theory

  • Annotation-based approaches

– Penn Discourse Treebank

  • Data-driven approaches
slide-86
SLIDE 86

A Data-driven Approach

  • Idea

Employ corpora without discourse annotation

(a) to evaluate models and theories of discourse, or (a) to create repositories of discourse information that may be applied in theory-based approaches or to support manual annotation.

slide-87
SLIDE 87

A Data-driven Approach

  • Idea

Employ corpora without discourse annotation

(a) to evaluate models and theories of discourse, or (a) to create repositories of discourse information that may be applied in theory-based approaches or to support manual annotation.

slide-88
SLIDE 88

Inferring Discourse Relations in SDRT

if

segment  can be attached to segment 

and

the event described in  is a pushing event with arguments x and y

and

the event described in  is a falling event of argument y

then, normally,

the discourse relation between  and  is a Result

(<t,, >  [Push(e,x,y)]K  [Fall(e,y)]K) > Result(,)

slide-89
SLIDE 89

Inferring Discourse Relations in SDRT

  • rules tailored towards specific event types

– not provided by any lexical-semantic resource I am aware of

– hard to construct manually

  • distributional hypothesis

– Discourse markers that are compatible with the „normal“ discourse relation for a pair of events should occur more frequently than incompatible discourse markers – So, let‘s just count them …

slide-90
SLIDE 90

Data Structures

  • event pair

<event1, event2>

  • triple

<event1, relation word, event2>

– event1: event type of the external argument – event2: event type of the internal argument – relation word: 0 or a discourse marker*

  • e.g., <push, fall>, <push, then, fall>
slide-91
SLIDE 91

Events

  • heuristic: event = lemma of main verb

– auxiliaries, modal verbs, etc. are stripped – it would be interesting to develop more

  • heuristic: event1 = event of preceding

sentence

– external argument is more likely to be the main event of the preceding utterance than anything else

  • more remote antecedent candidates are subject to

structural constraints

slide-92
SLIDE 92

Relation Words

  • adverbs, conjunctions, phrases, relative

clauses, etc.

  • purely syntactic definition

– to avoid preemptive restriction to limited set of relation words – relation word is the string representation of a sentence-initial adverbial argument of the main event in the new segment, a sentence-initial conjunction, or (if neither found) 0

slide-93
SLIDE 93

Weighing the Evidence

  • Noisy data

– external argument heuristically determined

  • Coarse-grained approximation of events

– relevant level of detail of event description may not be covered

=> Rigid, theoretically well-founded pruning

– significance tests

  • χ² where applicable, t-test otherwise
slide-94
SLIDE 94

Significance Tests

  • Given a relation word R and an event pair <x,y>
  • How probable is it that the relative frequency of

R under the condition <x,y> deviates by chance from the unconditioned relative frequency of R ?

R observed R not observed <x,y> freq(R|<x,y>) freq(<x,y>) – freq(R|<x,y>) all event pairs freq(R) sum_<a,b> freq(<a,b>) – freq(R)

slide-95
SLIDE 95

Significance Tests

  • Given a relation word R and an event pair <x,y>
  • How probable is it that the relative frequency of

R under the condition <x,y> deviates by chance from the unconditioned relative frequency of R ?

  • If this probability is below 5%, remove the triple.
  • Remaining triples are highly significant (p<.05).
slide-96
SLIDE 96

Correlation

  • Given a relation word R and an event pair

<x,y>

  • Assume that the distribution of R for <x,y>

differs significantly from the distribution of R in general.

  • P(R|<x,y>) > P(R): positive correlation
  • P(R|<x,y>) < P(R): negative correlation
slide-97
SLIDE 97

Data

  • Huge corpora needed

– adjacent sentences only – with some 1000 frequent verbs in a language, every event pair has a probability of 1:10^6 – relation words are optional and manifold, need several instantiations to establish significance => several million sentences needed

  • Syntax-defined relation words

=> syntax-annotated corpora

slide-98
SLIDE 98

Wacky corpora

(http://wacky.sslmit.unibo.it/doku.php)

  • PukWaC

– 2G-token dump of the uk domain – tagged and lemmatized with TreeTagger – parsed with MaltParser

  • Wackypedia

– English Wikipedia (2009), 0.8G-token – same annotations

  • Consider 80% of both corpora

– PukWaC: 72.5M sentences – Wackypedia: 33.2M sentences

slide-99
SLIDE 99

Evaluation

  • Goal

– Test whether, despite the simplifications, potentially usable results can be obtained with this methodology

  • Evaluation of the methodology, as preparation

for subsequent experiments

slide-100
SLIDE 100

Evaluation Criteria

  • Significance

– Are there significant correlations between event pairs and relation words ?

  • Reproducibility

– Can these correlations be confirmed on independent data sets ?

  • Interpretability

– Can these correlations be interpreted in terms of theoretically motivated discourse relations ?

slide-101
SLIDE 101

Significance

  • Significance test incorporated in the pruning

step of the algorithm

slide-102
SLIDE 102

Reproducibility

  • consider PukWaC subcorpora of different size
  • identify common triples also found in Wackypedia
  • agreeing portion of common triples: the same

(positive or negative) correlation in both corpora

slide-103
SLIDE 103

Interpretability

  • Theory- and annotation-independent test

– relation words with similar function should be distributionally similar – unrelated relation words should be distributionally less similar

  • Expectation

– but is more like however [contrastive] but very different from then [temporal/causal]

slide-104
SLIDE 104

Interpretability

  • Measure agreement

– For a given event pair for which two relation words A and B are significant, they have the same (positive or negative) correlation

slide-105
SLIDE 105

Results

  • Pilot study

– Heavy simplifications

  • external argument identification
  • coarse-grained approximation for event types

– Nevertheless

  • significant,
  • reproducible, and
  • (at least partially) interpretable results

– Background Knowledge Base

  • set of triples
slide-106
SLIDE 106

Background Knowledge Base, fragment

want buy

so, therefore

look

then then

forget

and

start

so

visit

alternatively

PukWaC

then also then alternatively, for example, also

slide-107
SLIDE 107

Background Knowledge Base, fragment

want buy

so, therefore

look

then then

forget

and

start

so

visit

alternatively

PukWaC

then also then alternatively, for example, also

To improve (1) Improve event modeling visit a store vs. visit a hospital (2) Level of detail negation (don’t forget) (3) Generalize over similar events to cope with data sparsity

slide-108
SLIDE 108

Related Research

  • Riaz & Girju (2010)

– extrapolate causal relations between utterances from co-occurrence statistics – applicable to small data sets

  • Chambers & Juravsky (2009), Kasch & Oates

(2010)

– identify chains of events and their temporal structure

  • Both restricted to selected discourse relations,

but provide more information over these

slide-109
SLIDE 109

Applications: Support Annotation

  • Support annotation of implicit discourse

relations

– predict the „normal“ relation word for a given pair

  • f events

– can be manually interpreted in terms of discourse relations

slide-110
SLIDE 110

Applications: Discourse Parsing

  • Given a mapping from relation words to a

taxonomy of discourse relations

– predict „normal“ discourse relations

  • Possible next steps

– Evaluation against discourse-annotated corpora – Developing a more scalable and more detailed model of events

  • underspecified with respect to the lemma
  • information about possible arguments
slide-111
SLIDE 111

Applications: Machine Translation

  • Languages differ in their use of relation words.
  • MT quality may benefit from introducing new relation

words.

There is one exception to this principle …* Es gibt allerdings eine Ausnahme von diesem Prinzip …**

* http://ec.europa.eu/enterprise/regulation/goods/mutrec_en.htm ** http://ec.europa.eu/enterprise/regulation/goods/mutrec_de.htm

slide-112
SLIDE 112

References

  • The slides are based on

– Christian Chiarcos (2012), Towards operationalizable models of discourse phenomena. Invited talk at NL Seminar@ISI/USC – Rohit Kate (2010), Discourse Processing – Aravind Joshi, Rashmi Prasad, Bonnie Webber (2006) Discourse Annotation: Discourse Connectives and Discourse Relations – Maite Taboada, Manfred Stede (2009), Introduction to RST Rhetorical Structure Theory

  • Further Reading

– Jurafsky & Martin (2009), §21

slide-113
SLIDE 113

Further Reading

  • Foundations of Discourse Semantics

– Michael Halliday, Ruqaiya Hasan (1976). Cohesion in

  • English. London: Longman.
  • RST

– William Mann, Sandra Thompson (1988). Rhetorical Structure Theory. Text 8(3), pp. 243-281.

  • SDRT

– Nicolas Asher (1993). Reference to Abstract Objects in

  • Discourse. Kluwer Academic Publishers.
  • PDTB & Applications

– Bonnie Webber, Aravind Joshi (2012), Discourse Structure: Past, Present and Future, Proceedings of the ACL 2012 Workshop on Rediscovering 50 Years of Discoveries. pp. 42-54. Jeju, Korea

slide-114
SLIDE 114

Further Reading

  • Data-driven approaches

– causal relations

  • Riaz & Girju (2010), Another Look at Causality: Discovering

Scenario-Specific Contingency Relationships with No Supervision, In Proc. 2010 IEEE Fourth International Conference on Semantic Computing, Pittsburgh, Pennsylvania.

– temporal relations

  • Chambers & Jurafsky (2009), Unsupervised Learning of

Narrative Schemas and their Participants, In Proc ACL-IJCNLP 2009, Singapore, p. 602-610

– any relation

  • Chiarcos (2012), Towards the Unsupervised Acquisition of

Discourse Relations, In Proc. ACL 2012, Jeju, Korea, p. 213- 217