automatic selection of context configurations for
play

Automatic Selection of Context Configurations for Improved - PowerPoint PPT Presentation

Automatic Selection of Context Configurations for Improved Class-Specific Word Representations Ivan Vuli, Roy Schwartz , Ari Rappoport, Roi Reichart and Anna Korhonen CoNLL 2017; Vancouver; August 3, 2017 1 / 13 Background Distributional


  1. Automatic Selection of Context Configurations for Improved Class-Specific Word Representations Ivan Vulić, Roy Schwartz , Ari Rappoport, Roi Reichart and Anna Korhonen CoNLL 2017; Vancouver; August 3, 2017 1 / 13

  2. Background Distributional Semantics: What is a Context? The nice people rode their horses bravely and rapidly 2 / 13

  3. Background Bag-of-words Distributional Semantics: What is a Context? The nice people rode their horses bravely and rapidly ◮ Bag-of-words: simplest approach ◮ Noisy 2 / 13

  4. Background Dependency links Distributional Semantics: What is a Context? det conj obj amod cc The nice people rode their horses bravely and rapidly ◮ Bag-of-words: simplest approach ◮ Noisy ◮ Dependency links: more accurate contexts ◮ Are all dependency links useful for representing words? ◮ Different dependency links represent different word classes [Lin, 1998, Levy and Goldberg, 2014] 2 / 13

  5. Background Coordinations / Distributional Semantics: What is a Context? Symmetric Patterns conj The nice people rode their horses bravely and rapidly ◮ Bag-of-words: simplest approach ◮ Noisy ◮ Dependency links: more accurate contexts ◮ Are all dependency links useful for representing words? ◮ Different dependency links represent different word classes ◮ Coordinations / symmetric patterns: more accurate and more efficient [Schwartz et al., 2015, Schwartz et al., 2016] 2 / 13

  6. Background Coordinations / Distributional Semantics: What is a Context? Symmetric Patterns adv conj obj nsubj amod The nice people rode their horses bravely and rapidly ◮ Bag-of-words: simplest approach ◮ Noisy ◮ Dependency links: more accurate contexts ◮ Are all dependency links useful for representing words? ◮ Different dependency links represent different word classes ◮ Coordinations / symmetric patterns: more accurate and more efficient ◮ But... valuable information gets lost [Schwartz et al., 2015, Schwartz et al., 2016] 2 / 13

  7. Main Contributions ◮ Detect which fine-grained context types are useful for different word classes ◮ Traverse the large space of context configurations efficiently to find the best context configuration ◮ Transfer the configurations learned for one task and one language to other tasks and languages without re-training 3 / 13

  8. Context Types (Universal) Labeled Dependency Edges ◮ (discovers, scientist_nsubj) ◮ (discovers, stars_dobj) ◮ (discovers, telescope_nmod) ◮ (stars, discovers_dobj-1) ◮ . . . 4 / 13

  9. Context Types (Universal) Labeled Dependency Edges ◮ (discovers, scientist_nsubj) ◮ (discovers, stars_dobj) ◮ (discovers, telescope_nmod) ◮ (stars, discovers_dobj-1) ◮ . . . 4 / 13

  10. Cross Lingual Context Transfer? 5 / 13

  11. Results: Individual Labels Adjectives Nouns Verbs 0 . 6 Spearman’s ρ 0 . 4 0 . 2 0 prep comp conj obj amod adv nummod 6 / 13

  12. Too many Context Configurations Adjectives Verbs Nouns amod, prep, acl, amod, prep, comp, subj, obj, conjlr, obj, comp, adv, appos, acl, nmod, conjlr, conjll conjlr, conjll conjll ◮ Traversing a potentially huge context configuration may be intractable 7 / 13

  13. Searching for Context Configurations An Adapted Beam-Search Algorithm l 1 , l 2 , l 3 , l 4 8 / 13

  14. Searching for Context Configurations An Adapted Beam-Search Algorithm f ( l 1 , l 2 , l 3 , l 4 ) f ( l 1 , l 2 , l 3 , l 4 ) l 1 , l 2 , l 3 , l 4 < f ( l 2 , l 3 , l 4 ) > f ( l 2 , l 3 , l 4 ) l 2 , l 3 , l 4 l 1 , l 3 , l 4 l 1 , l 2 , l 4 l 1 , l 2 , l 3 f ( x ) : dev set evaluation 8 / 13

  15. Searching for Context Configurations An Adapted Beam-Search Algorithm l 1 , l 2 , l 3 , l 4 l 2 , l 3 , l 4 l 1 , l 3 , l 4 l 1 , l 2 , l 4 l 1 , l 2 , l 3 l 2 , l 3 l 2 , l 4 l 3 , l 4 l 1 , l 2 l 1 , l 3 l 1 , l 4 8 / 13

  16. Searching for Context Configurations An Adapted Beam-Search Algorithm l 1 , l 2 , l 3 , l 4 l 2 , l 3 , l 4 l 1 , l 3 , l 4 l 1 , l 2 , l 4 l 1 , l 2 , l 3 l 2 , l 3 l 2 , l 4 l 3 , l 4 l 1 , l 2 l 1 , l 3 l 1 , l 4 l 1 l 2 l 3 l 4 8 / 13

  17. Searching for Context Configurations An Adapted Beam-Search Algorithm l 1 , l 2 , l 3 , l 4 l 2 , l 3 , l 4 l 1 , l 3 , l 4 l 1 , l 2 , l 4 l 1 , l 2 , l 3 l 2 , l 3 l 2 , l 4 l 3 , l 4 l 1 , l 2 l 1 , l 3 l 1 , l 4 l 1 l 2 l 3 l 4 8 / 13

  18. Experimental Setup ◮ Model: Skip-gram with negative sampling [Mikolov et al., 2013] ◮ Training data: Polyglot Wikipedia ◮ Evaluation: SimLex-999 word similarity dataset [Hill et al., 2015] ◮ 666 noun pairs, 222 verb pairs, 111 adjective pairs ◮ 2-fold cross validation ◮ Evaluation measure: Spearman’s ρ ◮ Baselines: A variety of standard context types ◮ Bag-of-words (w/ and w/o positions); all dependency links, coordination dependency links, symmetric patterns 9 / 13

  19. Results: Context Configurations 10 / 13

  20. Selected Contexts are Efficient BoW BoW + Coord. SP Dep. All BEST A BEST N BEST V Training Time (minutes) 200 100 0 11 / 13

  21. Transfer Results ◮ TOEFL ◮ 5% improvement over strongest baseline on verbs and nouns ◮ Other languages ◮ 0.02—0.08 ρ improvement on Italian and German accros all three word classes ◮ DE and IT SimLex999 [Leviant and Reichart, 2015] 12 / 13

  22. Take-Home Messages ◮ Different word classes require different ( finer-grained ) context configurations ◮ An automatic framework for computationally tractable selection of optimal context configurations ◮ Design based on Universal Dependencies: context configurations transferable to other tasks and languages without retraining ◮ Future work → finer-grained contexts, other word classes, more sophisticated search algorithms, other representation models, context weighting, ... 13 / 13

  23. Take-Home Messages ◮ Different word classes require different ( finer-grained ) context configurations ◮ An automatic framework for computationally tractable selection of optimal context configurations ◮ Design based on Universal Dependencies: context configurations transferable to other tasks and languages without retraining ◮ Future work → finer-grained contexts, other word classes, more sophisticated search algorithms, other representation models, context weighting, ... Thank you! 13 / 13

  24. References I Hill, F., Reichart, R., and Korhonen, A. (2015). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics . Leviant, I. and Reichart, R. (2015). Judgment language matters: Multilingual vector space models for judgment language aware lexical semantics. arxiv:1508.00106. Levy, O. and Goldberg, Y. (2014). Dependency-based word embeddings. In Proc. of ACL . Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proc. of ACL . Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781. Schwartz, R., Reichart, R., and Rappoport, A. (2015). Symmetric pattern based word embeddings for improved word similarity prediction. In Proc. of CoNLL . 1 / 2

  25. References II Schwartz, R., Reichart, R., and Rappoport, A. (2016). Symmetric patterns and coordinations: Fast and enhanced representations of verbs and adjectives. In Proc. of NAACL . 2 / 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend