Ev Evaluation Benchmarks and Learning Criteria fo for Di Discou
- urse-Aw
Ev Evaluation Benchmarks and Learning Criteria fo for Di Discou - - PowerPoint PPT Presentation
Ev Evaluation Benchmarks and Learning Criteria fo for Di Discou ourse-Aw Aware Sente ntence Represent ntations Mingda Chen Joint work with Zewei Chu and Kevin Gimpel Prior work on evaluation benchmarks Focus on capabilities of
1
2
3
4
5
6
7
8
9
rose a provisional 0.6% in September from August
NN-Comparison NS-Attribution 1 2 3 “N” represents “nucleus”, containing basic information for the relation. “S” represents “satellite”, containing additional information about the nucleus.
10
11
1. In any case, the brokerage firms are clearly moving faster to create new ads than they did in the fall of 1987. 2. But it remains to be seen whether their ads will be any more effective. La Label el: Co Comparison.Co Contrast 1. “A lot of investor confidence comes from the fact that they can speak to us,” he says.
crucial.” La Label el: Contingency cy.Cause PDTB-E PDTB-I
12
a provisional 0.6% in September from August
NN-Comparison NS-Attribution 1 2 3
13
a provisional 0.6% in September from August
NN-Comparison NS-Attribution 1 2 3
14
a provisional 0.6% in September from August
NN-Comparison NS-Attribution 1 2 3
15
a provisional 0.6% in September from August
NN-Comparison NS-Attribution 1 2 3
xleft
<latexit sha1_base64="UnJ58PRGjplezKhZyj5jQMLmV4=">AB9HicbVDLSgNBEJz1GeMr6tHLYBA8hd0o6DHoxWME84BkCbOT3mTI7Ow60xsSlnyHFw+KePVjvPk3Th4HTSxoKq6e4KEikMu63s7a+sbm1ndvJ7+7tHxwWjo7rJk41hxqPZaybATMghYIaCpTQTDSwKJDQCAZ3U78xBG1ErB5xnIAfsZ4SoeAMreSPOm2EWYSQpx0CkW35M5AV4m3IEWyQLVT+Gp3Y5GoJBLZkzLcxP0M6ZRcAmTfDs1kDA+YD1oWapYBMbPZkdP6LlVujSMtS2FdKb+nshYZMw4CmxnxLBvlr2p+J/XSjG8TOhkhRB8fmiMJUYzpNgHaFBo5ybAnjWthbKe8zTjanPI2BG/5VSL5e8y1L54apYuV3EkSOn5IxcEI9ckwq5J1VSI5w8kWfySt6cofPivDsf89Y1ZzFzQv7A+fwBjWySmw=</latexit>xright
<latexit sha1_base64="3LnEQ6Qh9/oRzf6ed8rPLlxJ9WI=">AB9XicbVDLTgJBEOz1ifhCPXqZSEw8kV0SPRi0dM5JHASmaHASbMzm5mehWy4T+8eNAYr/6LN/GAfagYCWdVKq6090VxFIYdN1vZ2V1bX1jM7eV397Z3dsvHBzWTZRoxmskpFuBtRwKRSvoUDJm7HmNAwkbwTDm6nfeOTaiEjd4zjmfkj7SvQEo2ilh1GnjXyEqRb9AU46haJbcmcgy8TLSBEyVDuFr3Y3YknIFTJjWl5box+SjUKJvk304Mjykb0j5vWapoyI2fzq6ekFOrdEkv0rYUkpn6eyKloTHjMLCdIcWBWfSm4n9eK8HelZ8KFSfIFZsv6iWSYESmEZCu0JyhHFtCmRb2VsIGVFOGNqi8DcFbfHmZ1Msl7xUvrsoVq6zOHJwDCdwBh5cQgVuoQo1YKDhGV7hzXlyXpx352PeuJkM0fwB87nD2Zukxg=</latexit>16
17
True position
18
19
1. The Broadway production took place on May 1, 1947, at the Ethel Barrymore Theatre. 2. The Metropolitan Opera presented it once, on July 31, 1965. 3. After years on the job, Ramsay has found himself one of the division's few real experts . 4. Despite his attempts to get her attention for sufficient time to ask his question, Lucy is occupied with interminable conversations on the telephone. 5. Between her calls, when Lucy leaves the room, Ben even takes the risk of trying to cut the telephone cord, though his attempt is unsuccessful. 6. Not wanting to miss his train, Ben leaves without asking Lucy for her hand in marriage.
20
1. The Broadway production took place on May 1, 1947, at the Ethel Barrymore Theatre. 2. The Metropolitan Opera presented it once, on July 31, 1965. 3. After years on the job, Ramsay has found himself one of the division's few real experts . 4. Despite his attempts to get her attention for sufficient time to ask his question, Lucy is occupied with interminable conversations on the telephone. 5. Between her calls, when Lucy leaves the room, Ben even takes the risk of trying to cut the telephone cord, though his attempt is unsuccessful. 6. Not wanting to miss his train, Ben leaves without asking Lucy for her hand in marriage.
21
22
indicates models that are trained to encode neighboring sentence information.
23 39.6 38.7 59.7 44.6 54.9 37.1 38 53.2 43.6 56.3 41.6 39.9 57.8 44.9 54.8 41.5 41.5 58.8 46.4 59.4 44.1 43.8 58.8 49.9 60.5 35 40 45 50 55 60 65 PDTB-E PDTB-I RST-DT SP DC Skip-thought InferSent DisSent ELMo BERT-Large
24 39.6 38.7 59.7 44.6 54.9 37.1 38 53.2 43.6 56.3 41.6 39.9 57.8 44.9 54.8 41.5 41.5 58.8 46.4 59.4 44.1 43.8 58.8 49.9 60.5 35 40 45 50 55 60 65 PDTB-E PDTB-I RST-DT SP DC Skip-thought InferSent DisSent ELMo BERT-Large
25 39.6 38.7 59.7 44.6 54.9 37.1 38 53.2 43.6 56.3 41.6 39.9 57.8 44.9 54.8 41.5 41.5 58.8 46.4 59.4 44.1 43.8 58.8 49.9 60.5 35 40 45 50 55 60 65 PDTB-E PDTB-I RST-DT SP DC Skip-thought InferSent DisSent ELMo BERT-Large
26 39.6 38.7 59.7 44.6 54.9 37.1 38 53.2 43.6 56.3 41.6 39.9 57.8 44.9 54.8 41.5 41.5 58.8 46.4 59.4 44.1 43.8 58.8 49.9 60.5 35 40 45 50 55 60 65 PDTB-E PDTB-I RST-DT SP DC Skip-thought InferSent DisSent ELMo BERT-Large
27
28
SentEval DiscoEval
29
EL ELMo Mo BE BERT-Ba Base SentEval 0.8 5.0 DiscoEval 1.3 8.9
30
Sentence ce Position Discourse Coherence ce Human 77.3 87.0 BERT-Large 49.9 60.5 Wiki arXiv ROC Wiki Ubuntu Human 84.0 76.0 94.0 98.0 74.0 BERT-Large 43.0 56.0 50.9 64.9 56.1
31
32
Nesting Level (NL) Section and Document Title (SDT) Sentence and Paragraph Position (SPP)
33
34 36.9 38 57 44.1 61.2 37 37.7 56.2 43.9 60 37.1 37.7 57.1 45.6 60.8 37.2 37.8 56.4 44.7 61.2 37.9 39.3 56.7 45.7 60.9 37.3 36.9 56.2 44.4 60.5 35 40 45 50 55 60 65 PDTB-E PDTB-I RST-DT SP DC Baseline +SDT +SPP +NL +SPP+NL +SDT+SPP
35 36.9 38 57 44.1 61.2 37 37.7 56.2 43.9 60 37.1 37.7 57.1 45.6 60.8 37.2 37.8 56.4 44.7 61.2 37.9 39.3 56.7 45.7 60.9 37.3 36.9 56.2 44.4 60.5 35 40 45 50 55 60 65 PDTB-E PDTB-I RST-DT SP DC Baseline +SDT +SPP +NL +SPP+NL +SDT+SPP
36 36.9 38 57 44.1 61.2 37 37.7 56.2 43.9 60 37.1 37.7 57.1 45.6 60.8 37.2 37.8 56.4 44.7 61.2 37.9 39.3 56.7 45.7 60.9 37.3 36.9 56.2 44.4 60.5 35 40 45 50 55 60 65 PDTB-E PDTB-I RST-DT SP DC Baseline +SDT +SPP +NL +SPP+NL +SDT+SPP
37
38