Multi-source projection of coreference chains Yulia Grishina and - - PowerPoint PPT Presentation

multi source projection of coreference chains
SMART_READER_LITE
LIVE PREVIEW

Multi-source projection of coreference chains Yulia Grishina and - - PowerPoint PPT Presentation

Multi-source projection of coreference chains Yulia Grishina and Manfred Stede Applied Computational Linguistics FSP Cognitive Sciences University of Potsdam / Germany Outline (I) idea (II) strategies (III) results (IV) error analysis (V)


slide-1
SLIDE 1

Multi-source projection

  • f coreference chains

Yulia Grishina and Manfred Stede

Applied Computational Linguistics FSP Cognitive Sciences University of Potsdam / Germany

slide-2
SLIDE 2

Outline

(I) idea (II) strategies (III) results (IV) error analysis (V) outcomes

slide-3
SLIDE 3

(1) Idea & Methodology

3

slide-4
SLIDE 4

Annotation projection

  • automatically transfer annotations from

source to target

slide-5
SLIDE 5

Annotation projection

  • automatically transfer annotations from

source to target

slide-6
SLIDE 6

Annotation projection

  • automatically transfer annotations from

source to target

slide-7
SLIDE 7

New: multi-src projection

7

  • (Yarowski et al., 2001): multiple translations of

Bible

  • (Agic et al., 2016): POS tags
  • (Rasooli and Collins, 2015; Johannsen et al.,

2016): dependency trees

  • .. coreference?
slide-8
SLIDE 8

The parallel corpus

  • 38 parallel texts
  • 3 languages: English, German, Russian
  • 3 text genres: newswire1, narratives2, medicine

instruction leaflets3 (only EN-DE)

1 multilingual newswire agency Project Syndicate (www.project-syndicate.org) 2 short narratives for second language acquisition Daisy stories (http://www.lonweb.org) 3 EMEA subcorpus of the OPUS collection of parallel corpora (Tiedemann, 2009)

slide-9
SLIDE 9

The parallel corpus

  • sentence-aligned
  • extracted sentences aligned in the three languages

(reduced sentences by 5% and coref. chains by 6% as compared to (Grishina & Stede, 2015))

  • word alignment using GIZA++ (Och & Ney, 2003)
slide-10
SLIDE 10

Annotation

  • common coreference annotation guidelines
  • uniform annotations in 3 languages
  • identity relation
  • see (Grishina & Stede, 2016)

10

slide-11
SLIDE 11

Annotation guidelines

  • NP coreference: full NPs, proper names, pronouns
  • no generic NPs annotated
  • no singletons annotated
slide-12
SLIDE 12

The parallel corpus

12

Newswire Narratives Total EN DE RU EN DE RU EN DE RU Tokens 5903 6268 5763 2619 2642 2343 8522 8910 8106 Sentences 239 252 239 190 186 192 429 438 431 REs 558 589 606 470 497 479 1028 1086 1085 Chains 124 140 140 45 45 48 169 185 188 REs/Chains (%) 4.5 4.2 4.3 10.4 11.0 10.0 6.1 5.9 5.8

(Grishina and Stede, 2015), (Grishina, 2016)

slide-13
SLIDE 13

(2) Strategies

13

slide-14
SLIDE 14

Multi-src projection: cases

14

L1 [a1]A a2 [a3]A . . . ak L2 b1 [b2]B [b3]B . . . bm L3 c1 c2 c3 c4 . . cn

languages c h a i n s

slide-15
SLIDE 15

Multi-src projection: trivial case

15

L1 [a1]A a2 [a3]A . . . ak L2 b1 [b2]B [b3]B . . . bm L3 [c1]A c2 [c3]A c4 . . cm

slide-16
SLIDE 16

Multi-src projection: trivial case

16

L1 [a1]A a2 [a3]A . . . ak L2 b1 [b2]B [b3]B . . . bm L3 [c1]AB c2 [c3]AB c4 . . cn

slide-17
SLIDE 17

Multi-src projection: trivial case

17

L1 [a1]A a2 [a3]A . . . ak L2 b1 [b2]B [b3]B . . . bm L3 [c1]AB c2 [c3]AB c4 . . cn

identical chains

slide-18
SLIDE 18

Multi-src projection: simple case

18

L1 [a1]A a2 [a3]A . . . ak L2 b1 [b2]B [b3]B . . . bm L3 [c1]A c2 [c3]A c4 . . cn

slide-19
SLIDE 19

Multi-src projection: simple case

19

L1 [a1]A a2 [a3]A . . . ak L2 b1 [b2]B [b3]B . . . bm L3 [c1]A [c2]B [c3]A [c4]B . . cn

slide-20
SLIDE 20

Multi-src projection: simple case

20

L1 [a1]A a2 [a3]A . . . ak L2 b1 [b2]B [b3]B . . . bm L3 [c1]A [c2]B [c3]A [c4]B . . cn

disjoint chains

slide-21
SLIDE 21

Multi-src projection: typical case

21

L1 [a1]A a2 [a3]A . . . ak L2 b1 [b2]B [b3]B . . . bm L3 [c1]A c2 [c3]A c4 . . cn

slide-22
SLIDE 22

Multi-src projection: typical case

22

L1 [a1]A a2 [a3]A . . . ak L2 b1 [b2]B [b3]B . . . bm L3 [c1]A [c2]B [c3]? c4 . . cn

slide-23
SLIDE 23

Multi-src projection: typical case

23

L1 [a1]A a2 [a3]A . . . ak L2 b1 [b2]B [b3]B . . . bm L3 [c1]A [c2]B [c3]? c4 . . cn A or B?

  • verlapping chains
slide-24
SLIDE 24

Strategies

24

concatenation voting, intersection

add: disjoint chains from

  • ne lang are added to the
  • ther languages

concatenate: overlapping chains merged together intersect: intersection of mentions for overlapping chains

slide-25
SLIDE 25

A real example

25

EN: [A fat lady] [who] wore a fur around [her] neck came in. [She] said that [she] needs [Daisy’s] help and does not know what to do. DE: [Eine dicke Dame mit einer Pelzstola] kam rein. [Sie] hat gesagt, dass [sie] [Daisys] Hilfe braucht und dass [sie] nicht weiß, was [sie] tun soll. RU: Вошла [полная дама, носившая мех вокруг шеи]. [Она] сказала, что [ей] необходима помощь [Дэйзи] и что [она] не знает, что [ей] делать.

slide-26
SLIDE 26

A real example

26

EN: [A fat lady] [who] wore a fur around [her] neck came in. [She] said that [she] needs [Daisy’s] help and does not know what to do. DE: [[Eine dicke Dame] mit einer Pelzstola] kam rein. [Sie] hat gesagt, dass [sie] [Daisys] Hilfe braucht und dass [sie] nicht weiß, was [sie] tun soll. RU: Вошла [полная дама, носившая мех вокруг шеи]. [Она] сказала, что [ей] необходима помощь [Дэйзи] и что [она] не знает, что [ей] делать.

slide-27
SLIDE 27

A real example

27

EN: [A fat lady] [who] wore a fur around [her] neck came in. [She] said that [she] needs [Daisy’s] help and does not know what to do. DE: [[Eine dicke Dame] mit einer Pelzstola] kam rein. [Sie] hat gesagt, dass [sie] [Daisys] Hilfe braucht und dass [sie] nicht weiß, was [sie] tun soll. RU: Вошла [полная дама, носившая мех вокруг шеи]. [Она] сказала, что [ей] необходима помощь [Дэйзи] и что [она] не знает, что [ей] делать.

slide-28
SLIDE 28

(3) Results

28

slide-29
SLIDE 29

Results

29

EN,RU->DE +ment EN,DE->RU +ment add 46.6 52.6 56.9 57.3 concatenate 49.6 57.0 58.6 59.0 intersect 35.7 40.3 40.7 40.8

slide-30
SLIDE 30

Results

30

EN,RU->DE +ment EN,DE->RU +ment add 46.6 52.6 +6.0 56.9 57.3 +0.4 concatenate 49.6 57.0 +7.4 58.6 59.0 +0.4 intersect 35.7 40.3 +4.6 40.7 40.8 +0.1

slide-31
SLIDE 31

Results: baselines

31

P R F1 EN-DE 55.3 43.8 48.7 RU-DE 40.9 26.7 31.9 EN,RU-DE-con 53.3 46.5 49.6 EN,RU-DE-int 63.0 25.7 35.7 EN-RU 68.0 51.6 58.5 DE-RU 54.4 28.9 37.3 EN,DE-RU-con 67.2 52.2 58.6 EN,DE-RU-int 78.0 28.1 40.7

slide-32
SLIDE 32

Results: baselines + ment

32

P R F1 EN-DE 63.2 50.0 55.7 RU-DE 41.7 27.0 32.3 EN,RU-DE-con 62.3 52.7 57.0 EN,RU-DE-int 71.8 29.1 40.3 EN-RU 68.4 52.4 58.8 DE-RU 54.9 29.0 37.6 EN,DE-RU-con 67.7 52.5 59.0 EN,DE-RU-int 79.1 28.1 40.8

slide-33
SLIDE 33

(4) Error analysis

33

slide-34
SLIDE 34

Projected markables by type

34

15 30 45 60 NPs NEs Pronouns

German Russian

slide-35
SLIDE 35

Markable accuracy by type

35

40.0 47.5 55.0 62.5 70.0 77.5 85.0 92.5 100.0 NPs NEs Pronouns

DE DE+ment RU RU+ment

slide-36
SLIDE 36

Markable accuracy by type

36

40.0 47.5 55.0 62.5 70.0 77.5 85.0 92.5 100.0 NPs NEs Pronouns 53.4 Minimum 95.2 Max.

DE DE+ment RU RU+ment

slide-37
SLIDE 37

Markable accuracy by # of tokens

37

German Russian

slide-38
SLIDE 38

(5) Outcomes

38

slide-39
SLIDE 39

Outcomes

39

  • comparable results for both languages: the highest

Precision of 78.0/79.1 for German/Russian and the highest Recall of 52.7 for both;

  • outperforms single-source projection in terms of

Precision and Recall; overall results are only slightly higher;

  • different directions of projection are not equally

good.

slide-40
SLIDE 40

Conclusions

  • for the first time implemented multi-source

projection for coreference and tested several strategies

  • it outperforms P&R scores as compared to single

source & achieves slightly better overall scores

  • NPs are more challenging for the projection than

pronouns; automatic mention extraction supports mention recovery for German.

slide-41
SLIDE 41

Future work

  • experimenting with more sophisticated

strategies based upon this study

  • projection with more than two source

languages

  • projection of automatic annotations &

system training

slide-42
SLIDE 42

thank you!