Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
Discourse structure and coherence
Christopher Potts CS 244U: Natural language understanding Mar 1
1 / 48
Discourse structure and coherence Christopher Potts CS 244U: - - PowerPoint PPT Presentation
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion Discourse structure and coherence Christopher Potts CS 244U: Natural language understanding Mar 1 1 / 48 Overview
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
1 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
2 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
3 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
3 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
3 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
3 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
3 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
4 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
5 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
5 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
5 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
5 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
6 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
7 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
8 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
9 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
Figure 5
I = = 23 13 14 15 16 17 18 19 20 4 5 6 7 8 9 10 11 12 I f
1; 2; 3; ,; 3; 6; 7;
~o ~ . . . .
0.7 0.6 I , , I 4 $ 67 8151 101 10 20 30 40 50 60 70 80 90 100
Xerox PARC TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages,
Most work in discourse processing, both theoretical and computational, has focused
portant for many discourse-processing tasks, such as anaphor resolution and dialogue
the level of the paragraph. This article describes a paragraph-level model of discourse structure based on the notion of subtopic shift, and an algorithm for subdividing expository texts into multi-paragraph "passages" or subtopic segments. In this work, the structure of an expository text is characterized as a sequence of subtopical discussions that occur in the context of one or more main topic discussions. Consider a 21-paragraph science news article, called Stargazers, whose main topic is the existence of life on earth and other planets. Its contents can be described as consisting
lm3 Intro - the search for life in space 4--5 The moon's chemical composition 6m8 How early earth-moon proximity shaped the moon 9--12 How the moon helped life evolve on earth 13 Improbability of the earth-moon system
14--16
Binary/trinary star systems make life unlikely 17--18 The low probability of nonbinary/trinary systems 19--20 Properties of earth's sun that facilitate life 21 Summary
Subtopic structure is sometimes marked in technical texts by headings and sub-
basic in discourse. However, many expository texts consist of long sequences of para- graphs with very little structural demarcation, and for these a subtopical segmentation can be useful.
* 3333 Coyote Hill Rd, Palo Alto, CA. 94304. E-mail: hearst@parc.xerox.com
(~) 1997 Association for Computational Linguistics 9 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
10 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
10 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
10 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
10 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
Position in concatenated docs Position in concatenated docs also also also also figh figh figh figh buff buff buff buff buff buff buff buff buff buff buff buff buff buff buff buff bull bull bull bull that that that that 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
11 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
11 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
11 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
is computed using the cosine measure as shown in equation 1. This is applied to all sentence pairs to generate a similarity matrix.
E:, f.~., x :~.,
sim(x,y) = ~_~., .~., ,,,
f2.xEjf2. (1)
Figure 1 shows an example of a similarity matrix ~ . High similarity values are represented by bright pix-
self-similarity for the first and last sentence, respec-
bright square regions along the diagonal. These re- gions represent cohesive text segments. Each value in the similarity matrix is replaced by its rank in the local region. The rank is the num- ber of neighbouring elements with a lower similarity
using a 3 x 3 rank mask with output range {0, 8}. For segmentation, we used a 11 x 11 rank mask. The
cumvent normalisation problems (consider the cases when the rank mask is not contained in the image).
Similarity matrix Rank matrix I 5 4 ~7 1~
i7 8,6 3 i 7
Step 1
i2.1' ~t
LTl~ '6,, ~ 7 ,
Step 2
12 14
Similarity matrix Rank matrix
2 3 2 I
;
4, 7
a
I
Step 3 Step 4
Figure 2: A working example of image ranking. Figure 1: An example similarity matrix. 3.2 Ranking For short text segments, the absolute value of
sire(x, y) is unreliable. An additional occurrence of
a common word (reflected in the numerator) causes a disproportionate increase in sim(x,y) unless the denominator (related to segment length) is large. Thus, in the context of text segmentation where a segment has typically < 100 informative tokens, one can only use the metric to estimate the order of sim- ilarity between sentences, e.g. a is more similar to b than c. Furthermore, language usage varies throughout a
a document is less cohesive than a section which is about a particular topic. Consequently, it is inap- propriate to directly compare the similarity values from different regions of the similarity matrix. In non-parametric statistical analysis, one com- pares the rank of data sets when the qualitative be- haviour is similar but the absolute quantities are un-
adaptation of that described in (O'Neil and Denos, 1992). 1The contrast of the image has been adjusted to highlight the image features. # of elements with a lower value
r = (2)
# of elements examined To demonstrate the effect of image ranking, the process was applied to the matrix shown in figure 1 to produce figure 32 . Notice the contrast has been improved significantly. Figure 4 illustrates the more subtle effects of our ranking scheme, r(x) is the rank (1 x 11 mask) of f(x) which is a sine wave with decaying mean, amplitude and frequency (equation 3). Figure 3: The matrix in figure 1 after ranking. 2The process was applied to the original matrix, prior to contra.st enhancement. The output image has not been en- hanced. :97
is computed using the cosine measure as shown in equation 1. This is applied to all sentence pairs to generate a similarity matrix.
E:, f.~., x :~.,
sim(x,y) = ~_~., .~., ,,,
f2.xEjf2. (1)
Figure 1 shows an example of a similarity matrix ~ . High similarity values are represented by bright pix-
self-similarity for the first and last sentence, respec-
bright square regions along the diagonal. These re- gions represent cohesive text segments. Each value in the similarity matrix is replaced by its rank in the local region. The rank is the num- ber of neighbouring elements with a lower similarity
using a 3 x 3 rank mask with output range {0, 8}. For segmentation, we used a 11 x 11 rank mask. The
cumvent normalisation problems (consider the cases when the rank mask is not contained in the image).
Similarity matrix Rank matrix I 5 4 ~7 1~
i7 8,6 3 i 7
Step 1
i2.1' ~t
LTl~ '6,, ~ 7 ,
Step 2
12 14
Similarity matrix Rank matrix
2 3 2 I
;
4, 7
a
I
Step 3 Step 4
Figure 2: A working example of image ranking. Figure 1: An example similarity matrix. 3.2 Ranking For short text segments, the absolute value of
sire(x, y) is unreliable. An additional occurrence of
a common word (reflected in the numerator) causes a disproportionate increase in sim(x,y) unless the denominator (related to segment length) is large. Thus, in the context of text segmentation where a segment has typically < 100 informative tokens, one can only use the metric to estimate the order of sim- ilarity between sentences, e.g. a is more similar to b than c. Furthermore, language usage varies throughout a
a document is less cohesive than a section which is about a particular topic. Consequently, it is inap- propriate to directly compare the similarity values from different regions of the similarity matrix. In non-parametric statistical analysis, one com- pares the rank of data sets when the qualitative be- haviour is similar but the absolute quantities are un-
adaptation of that described in (O'Neil and Denos, 1992). 1The contrast of the image has been adjusted to highlight the image features. # of elements with a lower value
r = (2)
# of elements examined To demonstrate the effect of image ranking, the process was applied to the matrix shown in figure 1 to produce figure 32 . Notice the contrast has been improved significantly. Figure 4 illustrates the more subtle effects of our ranking scheme, r(x) is the rank (1 x 11 mask) of f(x) which is a sine wave with decaying mean, amplitude and frequency (equation 3). Figure 3: The matrix in figure 1 after ranking. 2The process was applied to the original matrix, prior to contra.st enhancement. The output image has not been en- hanced. :97
is computed using the cosine measure as shown in equation 1. This is applied to all sentence pairs to generate a similarity matrix.
E:, f.~., x :~.,
sim(x,y) = ~_~., .~., ,,,
f2.xEjf2. (1)
Figure 1 shows an example of a similarity matrix ~ . High similarity values are represented by bright pix-
self-similarity for the first and last sentence, respec-
bright square regions along the diagonal. These re- gions represent cohesive text segments. Each value in the similarity matrix is replaced by its rank in the local region. The rank is the num- ber of neighbouring elements with a lower similarity
using a 3 x 3 rank mask with output range {0, 8}. For segmentation, we used a 11 x 11 rank mask. The
cumvent normalisation problems (consider the cases when the rank mask is not contained in the image).
Similarity matrix Rank matrix I 5 4 ~7 1~
i7 8,6 3 i 7
Step 1
i2.1' ~t
LTl~ '6,, ~ 7 ,
Step 2
12 14
Similarity matrix Rank matrix
2 3 2 I
;
4, 7
a
I
Step 3 Step 4
Figure 2: A working example of image ranking. Figure 1: An example similarity matrix. 3.2 Ranking For short text segments, the absolute value of
sire(x, y) is unreliable. An additional occurrence of
a common word (reflected in the numerator) causes a disproportionate increase in sim(x,y) unless the denominator (related to segment length) is large. Thus, in the context of text segmentation where a segment has typically < 100 informative tokens, one can only use the metric to estimate the order of sim- ilarity between sentences, e.g. a is more similar to b than c. Furthermore, language usage varies throughout a
a document is less cohesive than a section which is about a particular topic. Consequently, it is inap- propriate to directly compare the similarity values from different regions of the similarity matrix. In non-parametric statistical analysis, one com- pares the rank of data sets when the qualitative be- haviour is similar but the absolute quantities are un-
adaptation of that described in (O'Neil and Denos, 1992). 1The contrast of the image has been adjusted to highlight the image features. # of elements with a lower value
r = (2)
# of elements examined To demonstrate the effect of image ranking, the process was applied to the matrix shown in figure 1 to produce figure 32 . Notice the contrast has been improved significantly. Figure 4 illustrates the more subtle effects of our ranking scheme, r(x) is the rank (1 x 11 mask) of f(x) which is a sine wave with decaying mean, amplitude and frequency (equation 3). Figure 3: The matrix in figure 1 after ranking. 2The process was applied to the original matrix, prior to contra.st enhancement. The output image has not been en- hanced. :97
12 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
13 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
14 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
15 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
16 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
17 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
18 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
19 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
20 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
21 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
22 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
23 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
23 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
24 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
24 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
25 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
25 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
26 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
27 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
28 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
29 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
30 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
31 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
32 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
32 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
32 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
33 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
34 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
35 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
36 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
37 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
38 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
CONTRAST – 3,881,588 examples
[BOS EOS] [BOS But EOS] [BOS ] [but EOS] [BOS ] [although EOS] [BOS Although ,] [ EOS]
CAUSE-EXPLANATION-EVIDENCE — 889,946 examples
[BOS ] [because EOS] [BOS Because ,] [ EOS] [BOS EOS] [BOS Thus, EOS]
CONDITION — 1,203,813 examples
[BOS If ,] [ EOS] [BOS If ] [then EOS] [BOS ] [if EOS]
ELABORATION — 1,836,227 examples
[BOS EOS] [BOS for example EOS] [BOS ] [which ,]
NO-RELATION-SAME-TEXT — 1,000,000 examples
Randomly extract two sentences that are more than 3 sentences apart in a given text.
NO-RELATION-DIFFERENT-TEXTS — 1,000,000 examples
Randomly extract two sentences from two different documents.
39 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
(wi ,wj )2W⇥W count(wi,wj,r)
40 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
CONTRAST CEV COND ELAB NO-REL-SAME-TEXT NO-REL-DIFF-TEXTS CONTRAST
74 82 64 64
CEV
76 93 75 74
COND
89 69 71
ELAB
76 75
NO-REL-SAME-TEXT
64
CONTRAST CEV COND ELAB NO-REL-SAME-TEXT NO-REL-DIFF-TEXTS CONTRAST
58 78 64 72
CEV
69 82 64 68
COND
78 63 65
ELAB
78 78
NO-REL-SAME-TEXT
66
CAUSE-EXPLANATION-EVIDENCE classifiers,
41 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
CONTR CEV COND ELAB
# test cases 238 307 125 1761
CONTR
— 63 56 80 65 64 88
CEV
87 71 76 85
COND
87 93
42 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
43 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
44 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
45 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
46 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
47 / 48
Overview Discourse segmentation Discourse coherence theories Penn Discourse Treebank 2.0 Unsupervised coherence Conclusion
48 / 48