Wiktionary and NLP: Improving synonymy networks ACL-IJCNLP - - PowerPoint PPT Presentation

wiktionary and nlp improving synonymy networks
SMART_READER_LITE
LIVE PREVIEW

Wiktionary and NLP: Improving synonymy networks ACL-IJCNLP - - PowerPoint PPT Presentation

Wiktionary Synonymy networks Improving Wiktionarys network Conclusion and future work Wiktionary and NLP: Improving synonymy networks ACL-IJCNLP Singapore, 7 Aug 2009 Emmanuel Navarro IRIT, CNRS & Univ. of Toulouse Franck Sajous


slide-1
SLIDE 1

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary and NLP: Improving synonymy networks

ACL-IJCNLP

Singapore, 7 Aug 2009

Emmanuel Navarro IRIT, CNRS & Univ. of Toulouse Franck Sajous CLLE-ERSS, CNRS & Univ. of Toulouse Bruno Gaume CLLE-ERSS & IRIT, CNRS & Univ. of Toulouse Laurent Pr´ evot LPL, CNRS & Univ. of Provence ShuKai Hsieh English Department, NTNU, Taiwan Tzu-Yi Kuo Graduate Institute of Linguistics, NTU, Taiwan Pierre Magistry TIGP, CLCLP, Academia Sinica, GIL, NTU, Taiwan Chu-Ren Huang

  • Dept. of Chinese and Bilingual Studies,

Hong Kong Poly U., Hong Kong.

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 1/23

slide-2
SLIDE 2

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Goals

giving a method for improving synonymy networks; applying it to Wiktionary; in the meanwhile, investigate the possibilities of:

using Wiktionary as a resource for NLP; using NLP for improving Wiktionary.

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 2/23

slide-3
SLIDE 3

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Summary

1

Wiktionary

2

Synonymy networks Wiktionary graph Gold standards Comparison

3

Improving Wiktionary’s network Exploiting its Small World structure Using translation links

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 3/23

slide-4
SLIDE 4

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary as a lexical resource

Lexical resources NLP requires lexical resources English: Princeton WordNet Some other languages (eg. French): non-satisfaying and/or non-free Some others: purely under-resourced

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 4/23

slide-5
SLIDE 5

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary as a lexical resource

Lexical resources NLP requires lexical resources English: Princeton WordNet Some other languages (eg. French): non-satisfaying and/or non-free Some others: purely under-resourced Wiktionary multilingual freely available → a perfect candidate?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 4/23

slide-6
SLIDE 6

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description

Collaborative editing Non experts-led

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 5/23

slide-7
SLIDE 7

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description

Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 5/23

slide-8
SLIDE 8

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description

Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 5/23

slide-9
SLIDE 9

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description

Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 5/23

slide-10
SLIDE 10

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description

Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 5/23

slide-11
SLIDE 11

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description

Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 5/23

slide-12
SLIDE 12

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description

Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 5/23

slide-13
SLIDE 13

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description

Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 5/23

slide-14
SLIDE 14

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description

Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 5/23

slide-15
SLIDE 15

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description

Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms The ‘regular’ case, but. . . content&layout heterogeneous over languages and even within a given language

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 5/23

slide-16
SLIDE 16

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

1

Wiktionary

2

Synonymy networks Wiktionary graph Gold standards Comparison

3

Improving Wiktionary’s network Exploiting its Small World structure Using translation links

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 6/23

slide-17
SLIDE 17

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy

Modeling

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

slide-18
SLIDE 18

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy

Modeling

  • ne English entry’s POS

→ one vertex

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

slide-19
SLIDE 19

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy

Modeling

  • ne English entry’s POS

→ one vertex synonymy links among same POS

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

slide-20
SLIDE 20

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy

Modeling

  • ne English entry’s POS

→ one vertex synonymy links among same POS wordsenses flattened

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

slide-21
SLIDE 21

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy

Modeling

  • ne English entry’s POS

→ one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

slide-22
SLIDE 22

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy

Modeling

  • ne English entry’s POS

→ one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

slide-23
SLIDE 23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy

Modeling

  • ne English entry’s POS

→ one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

slide-24
SLIDE 24

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy

Modeling

  • ne English entry’s POS

→ one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

slide-25
SLIDE 25

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy

Modeling

  • ne English entry’s POS

→ one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

slide-26
SLIDE 26

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy

Modeling

  • ne English entry’s POS

→ one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

slide-27
SLIDE 27

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-28
SLIDE 28

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but:

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-29
SLIDE 29

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-30
SLIDE 30

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-31
SLIDE 31

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-32
SLIDE 32

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-33
SLIDE 33

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-34
SLIDE 34

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-35
SLIDE 35

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-36
SLIDE 36

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-37
SLIDE 37

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-38
SLIDE 38

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-39
SLIDE 39

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

buskin (N)

1

A half-boot

2

A type of boot worn by the ancient Athenian tragic actors mukluk (N)

1

A soft boot made of reindeer skin or sealskin and worn by Inuit. kick (N)

1

A hit or strike with the leg or foot

2

The action of swinging a foot or leg

3

Sth that tickles the fancy

4

(Internet) The removal of a person from an online activity

5

(figuratively) Any bucking motion of an object that lacks legs or feet

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-40
SLIDE 40

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

buskin (N)

1

A half-boot

2

A type of boot worn by the ancient Athenian tragic actors mukluk (N)

1

A soft boot made of reindeer skin or sealskin and worn by Inuit. kick (N)

1

A hit or strike with the leg or foot

2

The action of swinging a foot or leg

3

Sth that tickles the fancy

4

(Internet) The removal of a person from an online activity

5

(figuratively) Any bucking motion of an object that lacks legs or feet

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-41
SLIDE 41

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

buskin (N)

1

A half-boot

2

A type of boot worn by the ancient Athenian tragic actors mukluk (N)

1

A soft boot made of reindeer skin or sealskin and worn by Inuit. kick (N)

1

A hit or strike with the leg or foot

2

The action of swinging a foot or leg

3

Sth that tickles the fancy

4

(Internet) The removal of a person from an online activity

5

(figuratively) Any bucking motion of an object that lacks legs or feet

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-42
SLIDE 42

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

buskin (N)

1

A half-boot

2

A type of boot worn by the ancient Athenian tragic actors mukluk (N)

1

A soft boot made of reindeer skin or sealskin and worn by Inuit. kick (N)

1

A hit or strike with the leg or foot

2

The action of swinging a foot or leg

3

Sth that tickles the fancy

4

(Internet) The removal of a person from an online activity

5

(figuratively) Any bucking motion of an object that lacks legs or feet

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-43
SLIDE 43

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wordsenses flattened. . .

Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

buskin (N)

1

A half-boot

2

A type of boot worn by the ancient Athenian tragic actors mukluk (N)

1

A soft boot made of reindeer skin or sealskin and worn by Inuit. kick (N)

1

A hit or strike with the leg or foot

2

The action of swinging a foot or leg

3

Sth that tickles the fancy

4

(Internet) The removal of a person from an online activity

5

(figuratively) Any bucking motion of an object that lacks legs or feet

Another reason: One of our gold standard (Dicosyn) has its wordsenses flattened

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

slide-44
SLIDE 44

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network

WordNet synonymy between wordsenses relations already symmetric same POS in a given synset

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

slide-45
SLIDE 45

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network

WordNet synonymy between wordsenses relations already symmetric same POS in a given synset Modeling vertices: words edges between all words in a given synset

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

slide-46
SLIDE 46

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network

WordNet synonymy between wordsenses relations already symmetric same POS in a given synset Modeling vertices: words edges between all words in a given synset

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

slide-47
SLIDE 47

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network

WordNet synonymy between wordsenses relations already symmetric same POS in a given synset Modeling vertices: words edges between all words in a given synset

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

slide-48
SLIDE 48

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network

WordNet synonymy between wordsenses relations already symmetric same POS in a given synset Modeling vertices: words edges between all words in a given synset + using hyponymy with leave synsets containing single-words

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

slide-49
SLIDE 49

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network

WordNet synonymy between wordsenses relations already symmetric same POS in a given synset Modeling vertices: words edges between all words in a given synset + using hyponymy with leave synsets containing single-words

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

slide-50
SLIDE 50

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network

WordNet synonymy between wordsenses relations already symmetric same POS in a given synset Modeling vertices: words edges between all words in a given synset + using hyponymy with leave synsets containing single-words

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

slide-51
SLIDE 51

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network

WordNet synonymy between wordsenses relations already symmetric same POS in a given synset Modeling vertices: words edges between all words in a given synset + using hyponymy with leave synsets containing single-words

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

slide-52
SLIDE 52

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network

WordNet synonymy between wordsenses relations already symmetric same POS in a given synset Modeling vertices: words edges between all words in a given synset + using hyponymy with leave synsets containing single-words

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

slide-53
SLIDE 53

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Extracting Dicosyn synonymy network

Dicosyn compilation of synonymy relations extracted from 7 dictionaries (Bailly, Benac, Du Chazaud, Guizot, Lafaye, Larousse and Robert) ; produced at ATILF, corrected at CRISCO lab: http://elsap1.unicaen.fr/dicosyn.html wordsenses are flattened ; network already built ; just need to be symmetrized.

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 10/23

slide-54
SLIDE 54

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Small Worlds (SW)

Lexical resources are (often) SW

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

slide-55
SLIDE 55

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Small Worlds (SW)

Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ;

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

slide-56
SLIDE 56

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Small Worlds (SW)

Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ;

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

slide-57
SLIDE 57

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Small Worlds (SW)

Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution.

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

slide-58
SLIDE 58

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Small Worlds (SW)

Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. Eg.: to throw (WordNet)

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

slide-59
SLIDE 59

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Small Worlds (SW)

Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. Eg.: to throw (WordNet)

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

slide-60
SLIDE 60

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Small Worlds (SW)

Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. Eg.: to throw (WordNet)

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

slide-61
SLIDE 61

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Small Worlds (SW)

Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. studying graphs’ properties shows that Wiktionary, WordNet and Dicosyn are SW Eg.: to throw (WordNet)

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

slide-62
SLIDE 62

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Small Worlds (SW)

Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. studying graphs’ properties shows that Wiktionary, WordNet and Dicosyn are SW → we can take advantage of SW’s characteristics! Eg.: to throw (WordNet)

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

slide-63
SLIDE 63

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wiktionary FR/Dicosyn

Lexical coverage/Synonymy network

Words Wikt. DicoSyn Shared P R N 18017 29372 10393 58% 35% A 5411 9452 3076 57% 33% V 3897 9147 2966 76% 32% Exemple: Nouns synonymy network

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 12/23

slide-64
SLIDE 64

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wiktionary FR/Dicosyn

Lexical coverage/Synonymy network

Words Wikt. DicoSyn Shared P R N 18017 29372 10393 58% 35% A 5411 9452 3076 57% 33% V 3897 9147 2966 76% 32% Relations Exemple: Nouns synonymy network

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 12/23

slide-65
SLIDE 65

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wiktionary FR/Dicosyn

Lexical coverage/Synonymy network

Words Wikt. DicoSyn Shared P R N 18017 29372 10393 58% 35% A 5411 9452 3076 57% 33% V 3897 9147 2966 76% 32% Relations Exemple: Nouns synonymy network

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 12/23

slide-66
SLIDE 66

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wiktionary FR/Dicosyn

Lexical coverage/Synonymy network

Words Wikt. DicoSyn Shared P R N 18017 29372 10393 58% 35% A 5411 9452 3076 57% 33% V 3897 9147 2966 76% 32% Relations Exemple: Nouns synonymy network

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 12/23

slide-67
SLIDE 67

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wiktionary FR/Dicosyn

Lexical coverage/Synonymy network

Words Wikt. DicoSyn Shared P R N 18017 29372 10393 58% 35% A 5411 9452 3076 57% 33% V 3897 9147 2966 76% 32% Relations Wikt. DicoSyn Shared P R N 3510 44501 3510 69% 8% A 1300 17404 1677 78% 7% V 899 23968 1267 71% 4% Exemple: Nouns synonymy network

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 12/23

slide-68
SLIDE 68

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

Wiktionary EN/WordNet

Lexical coverage/Synonymy network

Words Wikt. WordNet Shared P R N 22075 117798 14120 64% 12% A 8437 21479 5874 70% 27% V 6368 11529 5157 81% 45% Relations Wikt. Wordnet Shared P R N 6453 18440 2763 43% 15% A 3139 12792 1314 42% 10% V 2667 18725 993 37% 5% Exemple: Nouns synonymy network

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 13/23

slide-69
SLIDE 69

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

  • Comments. . .

Gold standards, precision&recall a rough comparison, but:

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

slide-70
SLIDE 70

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

  • Comments. . .

Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

slide-71
SLIDE 71

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

  • Comments. . .

Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

slide-72
SLIDE 72

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

  • Comments. . .

Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples:

‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

slide-73
SLIDE 73

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

  • Comments. . .

Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples:

‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

slide-74
SLIDE 74

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

  • Comments. . .

Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples:

‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

slide-75
SLIDE 75

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

  • Comments. . .

Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples:

‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce↔to decrease’, ‘to cook↔to microwave’ (all words appear in WN) → noise?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

slide-76
SLIDE 76

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

  • Comments. . .

Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples:

‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce↔to decrease’, ‘to cook↔to microwave’ (all words appear in WN) → noise?

→ (we assume that) with time, recall will grow

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

slide-77
SLIDE 77

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Wiktionary graph Gold standards Comparison

  • Comments. . .

Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples:

‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce↔to decrease’, ‘to cook↔to microwave’ (all words appear in WN) → noise?

→ (we assume that) with time, recall will grow → is it possible to (automatically) measure precision?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

slide-78
SLIDE 78

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

1

Wiktionary

2

Synonymy networks Wiktionary graph Gold standards Comparison

3

Improving Wiktionary’s network Exploiting its Small World structure Using translation links

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 15/23

slide-79
SLIDE 79

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Neighbours method

Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood”

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

slide-80
SLIDE 80

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Neighbours method

Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

slide-81
SLIDE 81

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Neighbours method

Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

slide-82
SLIDE 82

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Neighbours method

Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

slide-83
SLIDE 83

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Neighbours method

Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

slide-84
SLIDE 84

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Neighbours method

Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

slide-85
SLIDE 85

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Neighbours method

Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

slide-86
SLIDE 86

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Prox method

Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.”

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 17/23

slide-87
SLIDE 87

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Prox method

Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.”

  • Eg. from Word1

1 2 3 4 5 6 7 8 9 Initial 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 17/23

slide-88
SLIDE 88

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Prox method

Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.”

  • Eg. from Word1

1 2 3 4 5 6 7 8 9 Step 1 0.25 0.25 0.25 0.25 0.00 0.00 0.00 0.00 0.00 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 17/23

slide-89
SLIDE 89

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Prox method

Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.”

  • Eg. from Word1

1 2 3 4 5 6 7 8 9 Step 2 0.22 0.22 0.15 0.17 0.11 0.04 0.04 0.01 0.00 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 17/23

slide-90
SLIDE 90

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Prox method

Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.”

  • Eg. from Word1

1 2 3 4 5 6 7 8 9 Step 3 0.16 0.19 0.17 0.15 0.11 0.08 0.06 0.00 0.02 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 17/23

slide-91
SLIDE 91

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Results

2000 4000 6000 8000 10000 12000 14000 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 P

prox3 neigh random

2000 4000 6000 8000 10000 12000 14000 0.03 0.04 0.05 0.06 0.07 0.08 0.09 R 2000 4000 6000 8000 10000 12000 14000 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 F fr.V

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 18/23

slide-92
SLIDE 92

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Results

2000 4000 6000 8000 10000 12000 14000 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 P

prox3 neigh random

2000 4000 6000 8000 10000 12000 14000 0.03 0.04 0.05 0.06 0.07 0.08 0.09 R 2000 4000 6000 8000 10000 12000 14000 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 F fr.V

Comments Prox method provides (ordered) relevant links

  • eg. ‘to absolve’↔‘to

forgive’, absent from WN false positives may be intersting to consider:

‘to uncover’↔‘to peel’ (hypernymy) ‘to skin’↔‘to peel’ (‘inter-domain synonymy’)

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 18/23

slide-93
SLIDE 93

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Translation links method

Intuition

2 words sharing many translations in different languages are likely to be synonymous

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 19/23

slide-94
SLIDE 94

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Translation links method

Intuition

2 words sharing many translations in different languages are likely to be synonymous

Method

let Tw be the set of a word w’s translations for every pair of words (w,w ′): Jaccard(w, w ′) = |Tw ∩ Tw′| |Tw ∪ Tw′| incrementally add relations, according to the Jaccard rank, up to a given threshold

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 19/23

slide-95
SLIDE 95

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Translation links method

Intuition

2 words sharing many translations in different languages are likely to be synonymous

Method

let Tw be the set of a word w’s translations for every pair of words (w,w ′): Jaccard(w, w ′) = |Tw ∩ Tw′| |Tw ∪ Tw′| incrementally add relations, according to the Jaccard rank, up to a given threshold

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 19/23

slide-96
SLIDE 96

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Results

2000 4000 6000 8000 10000 12000 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 P

Jaccard random

2000 4000 6000 8000 10000 12000 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 R 2000 4000 6000 8000 10000 12000 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 F figure 2 (French Verb)

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 20/23

slide-97
SLIDE 97

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Results

2000 4000 6000 8000 10000 12000 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 P

Jaccard random

2000 4000 6000 8000 10000 12000 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 R 2000 4000 6000 8000 10000 12000 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 F figure 2 (French Verb)

Comments adding first 1000 edges (+55%) → loss of only 2% precision added links are not the same as with Prox method

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 20/23

slide-98
SLIDE 98

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Exploiting its Small World structure Using translation links

Results

2000 4000 6000 8000 10000 12000 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 P

Jaccard random

2000 4000 6000 8000 10000 12000 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 R 2000 4000 6000 8000 10000 12000 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 F figure 2 (French Verb)

Comments adding first 1000 edges (+55%) → loss of only 2% precision added links are not the same as with Prox method Idea use translations method to densify the graph then use the clusters’ structure (Prox)

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 20/23

slide-99
SLIDE 99

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Conclusion

Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 21/23

slide-100
SLIDE 100

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Conclusion

Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous Our methods work. . . but there is room for improvement → combine both methods should give better results

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 21/23

slide-101
SLIDE 101

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Conclusion

Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous Our methods work. . . but there is room for improvement → combine both methods should give better results Direct application support for collaborative editing → module to be included in Wiktionary’s framework? a list of synonyms, ordered by relevancy may be provided to the contributor

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 21/23

slide-102
SLIDE 102

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Future work

Diachronic study study how wiktionaries evolve → forsee contributors’ NLP needs

  • eg. when to apply the methods presented here

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 22/23

slide-103
SLIDE 103

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Future work

Diachronic study study how wiktionaries evolve → forsee contributors’ NLP needs

  • eg. when to apply the methods presented here

Invariants and variability study of the (in)varibility of semantic pairings (Wiktionary as a multilingual synonymy networks)

  • eg. house/family, child/fruit, feel/know

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 22/23

slide-104
SLIDE 104

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Thank you! Questions?

Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 23/23