 
              Advanced Natural Language Processing Lecture 21 Discourse and its Structures (abbreviated version) Bonnie Webber 9 November 2012 Webber ANLP Lecture 21 9 November 2012
1 What are discourse structures? Discourse structures are the patterns one sees in multi-sentence (multi-clausal) texts. Recognizing these pattern(s) and what they convey is useful for deriving intended information from the text. Researchers in Language Technology (LT) are beginning to be able to recognize and exploit these patterns for useful ends. Webber ANLP Lecture 21 9 November 2012
2 What kind of patterns? • Topic patterns, each topic about a set of set of entities and what’s being said about them, as in text books, encyclopedias, and other expository text . • Functional patterns, each element serving a particular purpose with respect to the discourse as a whole or some other segment of discourse, as in essays, legal arguments, and scientific research papers; • Patterns of eventualities – Events and states, and their spatio-temporal relations, both of which are an essential part of narratives. At a lower level, discourse shows patterns of coherence relations – aka discourse relations – between abstract objects, which is what a discourse segment is interpreted as. Webber ANLP Lecture 21 9 November 2012
3 Elements of discourse structure The slides in anlp21-full.pdf discuss all these types of patterns. For lack of time, we focus here on coherence relations, since Assignment 3 focusses on a possible link between anaphor resolution and coherence relations. Webber ANLP Lecture 21 9 November 2012
4 Coherence Relations Coherence relations are binary relations that hold between pairs of segments in a text — primarily by virtue of their meaning. Lexicalized means that there is lexico-syntactic evidence for the existence of a coherence relation, though possibly ambiguous with respect to sense. The largest manually annotated corpus of lexicalized coherence relations is the Penn Discourse TreeBank (PDTB), over the 1m-word Penn Wall Street Journal Corpus. N.B. Similar corpora are being developed for other languages (e.g., Turkish, Hindi, Chinese, Modern Standard Arabic) and genres (biomedical journal papers, dialogue). Webber ANLP Lecture 21 9 November 2012
5 Lexicalized Approach to Coherence Relations Two primary sources of lexico-syntactic evidence for a coherence relation are: • explicit connectives that express a relation between clauses (e.g. coordinating and subordinating conjunctions, discourse adverbials); • implicit connectives between otherwise unmarked adjacent sentences, if one or more explicit connectives can be inferred that express the relation(s) between them. The latter have the same status as the implicit relations between noun-noun modifiers in English: (1) container ship crane operator courses (courses for operators of cranes for ships carrying containers) Webber ANLP Lecture 21 9 November 2012
6 PDTB Annotation of Coherence Relations All coherence relations identified to date are binary , with two and only two arguments. In the PDTB, the arg syntactically attached to the connective is called arg2, and the other, arg1. (2) By most measures, the nation’s industrial sector is now growing very slowly – if at all. Factory payrolls fell in September. So did the Federal Reserve Board’s industrial-production index. Yet many economists aren’t predicting that the economy is about to slip into recession. [wsj 0036] Annotators first annotated all explicit connectives in the PDTB and their two arguments. Webber ANLP Lecture 21 9 November 2012
7 PDTB Annotation of Coherence Relations For coherence relations between adjacent sentences, if annotators could infer ≥ 1 connective(s) that express the relation(s) between the sentences, they insert the connective(s), with all or part of the first sentence as arg1, and all, part of or possibly more than the second, as arg2. (3) Mr. Lane’s final purpose isn’t to glamorize the Artist’s vagabond existence. He has a point he wants to make, and he makes it, with a great deal of force. [wsj 0039] These are called implicit connectives . Webber ANLP Lecture 21 9 November 2012
8 PDTB Annotation of Coherence Relations For coherence relations between adjacent sentences, if annotators could infer ≥ 1 connective(s) that express the relation(s) between the sentences, they insert the connective(s), with all or part of the first sentence as arg1, and all, part of or possibly more than the second, as arg2. (3) Mr. Lane’s final purpose isn’t to glamorize the Artist’s vagabond existence. Implicit= rather He has a point he wants to make, and he makes it, with a great deal of force. [wsj 0039] These are called implicit connectives . If sentence 2 contained any explicit inter-S connective, it was taken to hold between the pair, so no further annotation was added. Webber ANLP Lecture 21 9 November 2012
9 Other lexicalizations of discourse relations If annotators felt the relation was already expressed, they were asked to annotate the lexico-syntactic evidence for it as an alternative lexicalization or AltLex : (4) The two companies each produce market pulp, containerboard and white paper. That means goods could be manufactured closer to customers, saving shipping costs, he said. [wsj 0317] (5) The new structure would be similar to a recapitalization in which holders get a special dividend yet retain a controlling ownership interest. The difference is that current holders wouldn’t retain majority ownership or control. [wsj 1531] Webber ANLP Lecture 21 9 November 2012
10 Other types of relations When an adjacent sentence appeared only related to its predecessor through entity-based coherence, this was labelled EntRel . (6) Hale Milgrim, 41 years old, senior vice president, marketing at Elecktra Entertainment Inc., was named president of Capitol Records Inc., a unit of this entertainment concern. EntRel Mr. Milgrim succeeds David Berman, who resigned last month. [wsj 0945] Webber ANLP Lecture 21 9 November 2012
11 No coherence relation If no relation could be perceived between a successive pair of sentences, NoRel was explicitly marked between them. (7) Dodge reported an 8% increase in construction contracts awarded in September. NoRel The goverment counts money as it is spent [wsj 0036] Webber ANLP Lecture 21 9 November 2012
12 Total relation annotation in the PDTB PDTB Relations No. of tokens Explicit 18459 Implicit 16224 AltLex 624 EntRel 5210 NoRel 254 Total 40600 Many strings annotated as AltLex in the PDTB 2.0, will be included as explicit connectives in the next version of the corpus. Webber ANLP Lecture 21 9 November 2012
13 PDTB Sense Hierarchy Webber ANLP Lecture 21 9 November 2012
14 PDTB Sense Hierarchy: Temporal • Synchronous • Asynchronous (precedence, succession) Webber ANLP Lecture 21 9 November 2012
15 PDTB Sense Hierarchy: Comparison • Contrast (juxtaposition, opposition) • Concession (expectation, contra-expectation) • Pragmatic Contrast • Pragmatic Concession (2) By most measures, the nation’s industrial sector is now growing very slowly – if at all. . . . Yet [ Comp.Concession.contra-expectation ] many economists aren’t predicting that the economy is about to slip into recession. [wsj 0036] Webber ANLP Lecture 21 9 November 2012
16 PDTB Sense Hierarchy: Contingency • Cause (reason, result) • Condition • Pragmatic cause (justification) • Pragmatic condition (4) The two companies each produce market pulp, containerboard and white paper. [ Cont.Cause.result ] That means goods could be manufactured closer to customers, saving shipping costs, he said. [wsj 0317] Webber ANLP Lecture 21 9 November 2012
17 PDTB Sense Hierarchy: Expansion • Conjunction • Instantiation • Restatement • Alternative (conjunctive, disjunctive, chosen alternative) • Exception • List (3) Mr. Lane’s final purpose isn’t to glamorize the Artist’s vagabond existence. [ Exp.Alternative.chosen alternative ] He has a point he wants to make, and he makes it, with a great deal of force. [wsj 0039] Webber ANLP Lecture 21 9 November 2012
18 Automatically recognizing coherence relations Task involves: • Identifying the evidence for the discourse relation – ie, evidence for the “discourse predicate”; • Identifying the arguments related by that predicate; • Identifying the sense of the relation. [ Elwell & Baldridge, 2008; Lin et al, 2010; Pitler & Nenkova, 2009; Prasad et al. 2008; Prasad, Joshi & Webber, 2010; Wellner & Pustejovsky, 2007 ] Webber ANLP Lecture 21 9 November 2012
19 Classifying sense relations Evidence for the sense relation between discourse elements may be explicit or only implicit. Some explicit discourse connectives are unambiguous: Conn sense Conn sense accordingly result (5/5) in addition conjunction (165/165) additionally conjunction (7/7) moreover conjunction (100/101) afterward precedence (11/11) so result (262/263) as a result result (78/78) thus result (112/112) consequently result (10/10) till precedence (3/3) for instance instantiation (98/98) unless disjunctive (94/95) Webber ANLP Lecture 21 9 November 2012
Recommend
More recommend