wiktionary and nlp improving synonymy networks
play

Wiktionary and NLP: Improving synonymy networks ACL-IJCNLP - PowerPoint PPT Presentation

Wiktionary Synonymy networks Improving Wiktionarys network Conclusion and future work Wiktionary and NLP: Improving synonymy networks ACL-IJCNLP Singapore, 7 Aug 2009 Emmanuel Navarro IRIT, CNRS & Univ. of Toulouse Franck Sajous


  1. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

  2. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

  3. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

  4. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 7/23

  5. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  6. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  7. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  8. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  9. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  10. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  11. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  12. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  13. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  14. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  15. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  16. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  17. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N) mukluk (N) 1 1 A half-boot A soft boot made of reindeer skin or 2 A type of boot worn by the sealskin and worn by ancient Athenian tragic actors Inuit. kick (N) 1 A hit or strike with the leg or foot 2 The action of swinging a foot or leg 3 Sth that tickles the fancy 4 (Internet) The removal of a person from an online activity 5 (figuratively) Any bucking motion of an object that lacks legs or feet Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  18. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N) mukluk (N) 1 1 A half-boot A soft boot made of reindeer skin or 2 A type of boot worn by the sealskin and worn by ancient Athenian tragic actors Inuit. kick (N) 1 A hit or strike with the leg or foot 2 The action of swinging a foot or leg 3 Sth that tickles the fancy 4 (Internet) The removal of a person from an online activity 5 (figuratively) Any bucking motion of an object that lacks legs or feet Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  19. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N) mukluk (N) 1 1 A half-boot A soft boot made of reindeer skin or 2 A type of boot worn by the sealskin and worn by ancient Athenian tragic actors Inuit. kick (N) 1 A hit or strike with the leg or foot 2 The action of swinging a foot or leg 3 Sth that tickles the fancy 4 (Internet) The removal of a person from an online activity 5 (figuratively) Any bucking motion of an object that lacks legs or feet Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  20. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N) mukluk (N) 1 1 A half-boot A soft boot made of reindeer skin or 2 A type of boot worn by the sealskin and worn by ancient Athenian tragic actors Inuit. kick (N) 1 A hit or strike with the leg or foot 2 The action of swinging a foot or leg 3 Sth that tickles the fancy 4 (Internet) The removal of a person from an online activity 5 (figuratively) Any bucking motion of an object that lacks legs or feet Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  21. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N) mukluk (N) 1 1 A half-boot A soft boot made of reindeer skin or 2 A type of boot worn by the sealskin and worn by ancient Athenian tragic actors Inuit. kick (N) 1 A hit or strike with the leg or foot 2 The action of swinging a foot or leg 3 Sth that tickles the fancy 4 (Internet) The removal of a person from an online activity 5 (figuratively) Any bucking motion of an object that lacks legs or feet Another reason: One of our gold standard (Dicosyn) has its wordsenses flattened Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 8/23

  22. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting WordNet’s synonymy network WordNet synonymy between wordsenses relations already symmetric same POS in a given synset Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

  23. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting WordNet’s synonymy network WordNet Modeling synonymy between wordsenses vertices: words relations already symmetric edges between all words in a given synset same POS in a given synset Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

  24. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting WordNet’s synonymy network WordNet Modeling synonymy between wordsenses vertices: words relations already symmetric edges between all words in a given synset same POS in a given synset Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

  25. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting WordNet’s synonymy network WordNet Modeling synonymy between wordsenses vertices: words relations already symmetric edges between all words in a given synset same POS in a given synset Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

  26. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting WordNet’s synonymy network WordNet Modeling synonymy between wordsenses vertices: words relations already symmetric edges between all words in a given synset same POS in a given synset + using hyponymy with leave synsets containing single-words Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

  27. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting WordNet’s synonymy network WordNet Modeling synonymy between wordsenses vertices: words relations already symmetric edges between all words in a given synset same POS in a given synset + using hyponymy with leave synsets containing single-words Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

  28. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting WordNet’s synonymy network WordNet Modeling synonymy between wordsenses vertices: words relations already symmetric edges between all words in a given synset same POS in a given synset + using hyponymy with leave synsets containing single-words Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

  29. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting WordNet’s synonymy network WordNet Modeling synonymy between wordsenses vertices: words relations already symmetric edges between all words in a given synset same POS in a given synset + using hyponymy with leave synsets containing single-words Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

  30. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting WordNet’s synonymy network WordNet Modeling synonymy between wordsenses vertices: words relations already symmetric edges between all words in a given synset same POS in a given synset + using hyponymy with leave synsets containing single-words Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 9/23

  31. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Extracting Dicosyn synonymy network Dicosyn compilation of synonymy relations extracted from 7 dictionaries (Bailly, Benac, Du Chazaud, Guizot, Lafaye, Larousse and Robert) ; produced at ATILF, corrected at CRISCO lab: http://elsap1.unicaen.fr/dicosyn.html wordsenses are flattened ; network already built ; just need to be symmetrized. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 10/23

  32. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Small Worlds (SW) Lexical resources are (often) SW Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

  33. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Small Worlds (SW) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

  34. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Small Worlds (SW) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

  35. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Small Worlds (SW) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

  36. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Small Worlds (SW) Eg.: to throw (WordNet) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

  37. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Small Worlds (SW) Eg.: to throw (WordNet) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

  38. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Small Worlds (SW) Eg.: to throw (WordNet) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

  39. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Small Worlds (SW) Eg.: to throw (WordNet) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. studying graphs’ properties shows that Wiktionary, WordNet and Dicosyn are SW Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

  40. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Small Worlds (SW) Eg.: to throw (WordNet) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. studying graphs’ properties shows that Wiktionary, WordNet and Dicosyn are SW → we can take advantage of SW’s characteristics! Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 11/23

  41. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wiktionary FR/Dicosyn Lexical coverage/Synonymy network Exemple: Nouns synonymy network Words Wikt. DicoSyn Shared P R N 18017 29372 10393 58% 35% A 5411 9452 3076 57% 33% V 3897 9147 2966 76% 32% Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 12/23

  42. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wiktionary FR/Dicosyn Lexical coverage/Synonymy network Exemple: Nouns synonymy network Words Wikt. DicoSyn Shared P R N 18017 29372 10393 58% 35% A 5411 9452 3076 57% 33% V 3897 9147 2966 76% 32% Relations Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 12/23

  43. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wiktionary FR/Dicosyn Lexical coverage/Synonymy network Exemple: Nouns synonymy network Words Wikt. DicoSyn Shared P R N 18017 29372 10393 58% 35% A 5411 9452 3076 57% 33% V 3897 9147 2966 76% 32% Relations Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 12/23

  44. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wiktionary FR/Dicosyn Lexical coverage/Synonymy network Exemple: Nouns synonymy network Words Wikt. DicoSyn Shared P R N 18017 29372 10393 58% 35% A 5411 9452 3076 57% 33% V 3897 9147 2966 76% 32% Relations Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 12/23

  45. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wiktionary FR/Dicosyn Lexical coverage/Synonymy network Exemple: Nouns synonymy network Words Wikt. DicoSyn Shared P R N 18017 29372 10393 58% 35% A 5411 9452 3076 57% 33% V 3897 9147 2966 76% 32% Relations Wikt. DicoSyn Shared P R N 3510 44501 3510 69% 8% A 1300 17404 1677 78% 7% V 899 23968 1267 71% 4% Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 12/23

  46. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Wiktionary EN/WordNet Lexical coverage/Synonymy network Exemple: Nouns synonymy network Words Wikt. WordNet Shared P R N 22075 117798 14120 64% 12% A 8437 21479 5874 70% 27% V 6368 11529 5157 81% 45% Relations Wikt. Wordnet Shared P R N 6453 18440 2763 43% 15% A 3139 12792 1314 42% 10% V 2667 18725 993 37% 5% Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 13/23

  47. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Comments. . . Gold standards, precision&recall a rough comparison, but: Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

  48. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

  49. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

  50. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’ , in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

  51. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’ , in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

  52. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’ , in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act ↔ to play’ Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

  53. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’ , in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act ↔ to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce ↔ to decrease’ , ‘to cook ↔ to microwave’ (all words appear in WN) → noise? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

  54. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’ , in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act ↔ to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce ↔ to decrease’ , ‘to cook ↔ to microwave’ (all words appear in WN) → noise? → (we assume that) with time, recall will grow Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

  55. Wiktionary Wiktionary graph Synonymy networks Gold standards Improving Wiktionary’s network Comparison Conclusion and future work Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’ , in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act ↔ to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce ↔ to decrease’ , ‘to cook ↔ to microwave’ (all words appear in WN) → noise? → (we assume that) with time, recall will grow → is it possible to (automatically) measure precision? Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 14/23

  56. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Wiktionary 1 Synonymy networks 2 Wiktionary graph Gold standards Comparison Improving Wiktionary’s network 3 Exploiting its Small World structure Using translation links Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 15/23

  57. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

  58. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

  59. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

  60. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

  61. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

  62. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

  63. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate. Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 16/23

  64. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v) , computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 17/23

  65. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v) , computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1 1 2 3 4 5 6 7 8 9 Initial 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 17/23

  66. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v) , computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1 1 2 3 4 5 6 7 8 9 Step 1 0.25 0.25 0.25 0.25 0.00 0.00 0.00 0.00 0.00 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 17/23

  67. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v) , computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1 1 2 3 4 5 6 7 8 9 Step 2 0.22 0.22 0.15 0.17 0.11 0.04 0.04 0.01 0.00 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 17/23

  68. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v) , computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1 1 2 3 4 5 6 7 8 9 Step 3 0.16 0.19 0.17 0.15 0.11 0.08 0.06 0.00 0.02 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 17/23

  69. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Results fr.V 0.8 prox3 0.7 neigh 0.6 random 0.5 0.4 P 0.3 0.2 0.1 0.0 0 2000 4000 6000 8000 10000 12000 14000 0.09 0.08 0.07 0.06 R 0.05 0.04 0.03 0 2000 4000 6000 8000 10000 12000 14000 0.13 0.12 0.11 0.10 0.09 F 0.08 0.07 0.06 0.05 0 2000 4000 6000 8000 10000 12000 14000 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 18/23

  70. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Results fr.V Comments 0.8 prox3 0.7 neigh 0.6 Prox method provides random 0.5 0.4 (ordered) relevant links P 0.3 0.2 eg. ‘to absolve’ ↔ ‘to 0.1 0.0 forgive’ , absent from WN 0 2000 4000 6000 8000 10000 12000 14000 0.09 0.08 false positives may be 0.07 0.06 intersting to consider: R 0.05 0.04 ‘to uncover’ ↔ ‘to 0.03 0 2000 4000 6000 8000 10000 12000 14000 peel’ (hypernymy) 0.13 0.12 0.11 ‘to skin’ ↔ ‘to peel’ 0.10 0.09 F (‘inter-domain 0.08 0.07 synonymy’) 0.06 0.05 0 2000 4000 6000 8000 10000 12000 14000 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 18/23

  71. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Translation links method Intuition 2 words sharing many translations in different languages are likely to be synonymous Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 19/23

  72. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Translation links method Intuition Method 2 words sharing many let T w be the set of a word w ’s translations translations in for every pair of words ( w , w ′ ): different languages Jaccard ( w , w ′ ) = | T w ∩ T w ′ | are likely to be | T w ∪ T w ′ | synonymous incrementally add relations, according to the Jaccard rank, up to a given threshold Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 19/23

  73. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Translation links method Intuition Method 2 words sharing many let T w be the set of a word w ’s translations translations in for every pair of words ( w , w ′ ): different languages Jaccard ( w , w ′ ) = | T w ∩ T w ′ | are likely to be | T w ∪ T w ′ | synonymous incrementally add relations, according to the Jaccard rank, up to a given threshold Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 19/23

  74. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Results figure 2 (French Verb) 0.8 Jaccard 0.7 random 0.6 0.5 0.4 P 0.3 0.2 0.1 0.0 0 2000 4000 6000 8000 10000 12000 0.16 0.14 0.12 0.10 R 0.08 0.06 0.04 0.02 0 2000 4000 6000 8000 10000 12000 0.22 0.20 0.18 0.16 0.14 F 0.12 0.10 0.08 0.06 0.04 0 2000 4000 6000 8000 10000 12000 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 20/23

  75. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Results Comments figure 2 (French Verb) adding first 1000 0.8 Jaccard 0.7 edges (+55%) → random 0.6 0.5 loss of only 2% 0.4 P 0.3 precision 0.2 0.1 added links are not 0.0 0 2000 4000 6000 8000 10000 12000 0.16 the same as with 0.14 Prox method 0.12 0.10 R 0.08 0.06 0.04 0.02 0 2000 4000 6000 8000 10000 12000 0.22 0.20 0.18 0.16 0.14 F 0.12 0.10 0.08 0.06 0.04 0 2000 4000 6000 8000 10000 12000 Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 20/23

  76. Wiktionary Synonymy networks Exploiting its Small World structure Improving Wiktionary’s network Using translation links Conclusion and future work Results Comments figure 2 (French Verb) adding first 1000 0.8 Jaccard 0.7 edges (+55%) → random 0.6 0.5 loss of only 2% 0.4 P 0.3 precision 0.2 0.1 added links are not 0.0 0 2000 4000 6000 8000 10000 12000 0.16 the same as with 0.14 Prox method 0.12 0.10 R 0.08 0.06 Idea 0.04 0.02 0 2000 4000 6000 8000 10000 12000 0.22 use translations 0.20 0.18 method to densify 0.16 0.14 the graph F 0.12 0.10 0.08 then use the 0.06 0.04 0 2000 4000 6000 8000 10000 12000 clusters’ structure (Prox) Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 20/23

  77. Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Conclusion Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 21/23

  78. Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work Conclusion Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous Our methods work. . . but there is room for improvement → combine both methods should give better results Franck Sajous ACL-IJCNLP - 7 Aug 2009 Wiktionary and NLP: Improving synonymy networks 21/23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend