Automatic Selection of Context Configurations for Improved - - PowerPoint PPT Presentation

automatic selection of context configurations for
SMART_READER_LITE
LIVE PREVIEW

Automatic Selection of Context Configurations for Improved - - PowerPoint PPT Presentation

Automatic Selection of Context Configurations for Improved Class-Specific Word Representations Ivan Vuli, Roy Schwartz , Ari Rappoport, Roi Reichart and Anna Korhonen CoNLL 2017; Vancouver; August 3, 2017 1 / 13 Background Distributional


slide-1
SLIDE 1

Automatic Selection of Context Configurations for Improved Class-Specific Word Representations

Ivan Vulić, Roy Schwartz, Ari Rappoport, Roi Reichart and Anna Korhonen

CoNLL 2017; Vancouver; August 3, 2017

1 / 13

slide-2
SLIDE 2

Background

Distributional Semantics: What is a Context?

The nice people rode their horses bravely and rapidly

2 / 13

slide-3
SLIDE 3

Background

Distributional Semantics: What is a Context?

Bag-of-words

The nice people rode their horses bravely and rapidly

◮ Bag-of-words: simplest approach

◮ Noisy 2 / 13

slide-4
SLIDE 4

Background

Distributional Semantics: What is a Context?

Dependency links

[Lin, 1998, Levy and Goldberg, 2014]

The nice people rode their horses bravely and rapidly

amod

  • bj

det cc conj

◮ Bag-of-words: simplest approach

◮ Noisy

◮ Dependency links: more accurate contexts

◮ Are all dependency links useful for representing words? ◮ Different dependency links represent different word classes 2 / 13

slide-5
SLIDE 5

Background

Distributional Semantics: What is a Context?

Coordinations / Symmetric Patterns

[Schwartz et al., 2015, Schwartz et al., 2016]

The nice people rode their horses bravely and rapidly

conj

◮ Bag-of-words: simplest approach

◮ Noisy

◮ Dependency links: more accurate contexts

◮ Are all dependency links useful for representing words? ◮ Different dependency links represent different word classes

◮ Coordinations / symmetric patterns: more accurate and more

efficient

2 / 13

slide-6
SLIDE 6

Background

Distributional Semantics: What is a Context?

Coordinations / Symmetric Patterns

[Schwartz et al., 2015, Schwartz et al., 2016]

The nice people rode their horses bravely and rapidly

amod

  • bj

nsubj adv conj

◮ Bag-of-words: simplest approach

◮ Noisy

◮ Dependency links: more accurate contexts

◮ Are all dependency links useful for representing words? ◮ Different dependency links represent different word classes

◮ Coordinations / symmetric patterns: more accurate and more

efficient

◮ But... valuable information gets lost 2 / 13

slide-7
SLIDE 7

Main Contributions

◮ Detect which fine-grained context types are useful for

different word classes

◮ Traverse the large space of context configurations efficiently

to find the best context configuration

◮ Transfer the configurations learned for one task and one

language to other tasks and languages without re-training

3 / 13

slide-8
SLIDE 8

Context Types

(Universal) Labeled Dependency Edges

◮ (discovers, scientist_nsubj) ◮ (discovers, stars_dobj) ◮ (discovers, telescope_nmod) ◮ (stars, discovers_dobj-1) ◮ . . .

4 / 13

slide-9
SLIDE 9

Context Types

(Universal) Labeled Dependency Edges

◮ (discovers, scientist_nsubj) ◮ (discovers, stars_dobj) ◮ (discovers, telescope_nmod) ◮ (stars, discovers_dobj-1) ◮ . . .

4 / 13

slide-10
SLIDE 10

Cross Lingual Context Transfer?

5 / 13

slide-11
SLIDE 11

Results: Individual Labels

conj

  • bj

prep amod comp adv nummod 0.2 0.4 0.6 Spearman’s ρ Adjectives Nouns Verbs

6 / 13

slide-12
SLIDE 12

Too many Context Configurations

Adjectives Verbs Nouns amod, conjlr, conjll prep, acl,

  • bj, comp, adv,

conjlr, conjll amod, prep, comp, subj, obj, appos, acl, nmod, conjlr, conjll

◮ Traversing a potentially huge context configuration may be intractable

7 / 13

slide-13
SLIDE 13

Searching for Context Configurations

An Adapted Beam-Search Algorithm

l1, l2, l3, l4

8 / 13

slide-14
SLIDE 14

Searching for Context Configurations

An Adapted Beam-Search Algorithm

f (x) : dev set evaluation

l1, l2, l3, l4 l2, l3, l4 l1, l3, l4 l1, l2, l4 l1, l2, l3

f (l1, l2, l3, l4) < f (l2, l3, l4) f (l1, l2, l3, l4) > f (l2, l3, l4)

8 / 13

slide-15
SLIDE 15

Searching for Context Configurations

An Adapted Beam-Search Algorithm

l1, l2, l3, l4 l2, l3, l4 l1, l3, l4 l1, l2, l4 l1, l2, l3 l2, l3 l2, l4 l3, l4 l1, l2 l1, l3 l1, l4

8 / 13

slide-16
SLIDE 16

Searching for Context Configurations

An Adapted Beam-Search Algorithm

l1, l2, l3, l4 l2, l3, l4 l1, l3, l4 l1, l2, l4 l1, l2, l3 l2, l3 l2, l4 l3, l4 l1, l2 l1, l3 l1, l4 l1 l2 l3 l4

8 / 13

slide-17
SLIDE 17

Searching for Context Configurations

An Adapted Beam-Search Algorithm

l1, l2, l3, l4 l2, l3, l4 l1, l3, l4 l1, l2, l4 l1, l2, l3 l2, l3 l2, l4 l3, l4 l1, l2 l1, l3 l1, l4 l1 l2 l3 l4

8 / 13

slide-18
SLIDE 18

Experimental Setup

◮ Model: Skip-gram with negative sampling [Mikolov et al., 2013] ◮ Training data: Polyglot Wikipedia ◮ Evaluation: SimLex-999 word similarity dataset [Hill et al., 2015]

◮ 666 noun pairs, 222 verb pairs, 111 adjective pairs ◮ 2-fold cross validation ◮ Evaluation measure: Spearman’s ρ

◮ Baselines: A variety of standard context types

◮ Bag-of-words (w/ and w/o positions); all dependency links,

coordination dependency links, symmetric patterns

9 / 13

slide-19
SLIDE 19

Results: Context Configurations

10 / 13

slide-20
SLIDE 20

Selected Contexts are Efficient

100 200 Training Time (minutes) BoW BoW+ Coord. SP Dep.All BESTA BESTN BESTV

11 / 13

slide-21
SLIDE 21

Transfer Results

◮ TOEFL

◮ 5% improvement over strongest baseline on verbs and nouns

◮ Other languages

◮ 0.02—0.08 ρ improvement on Italian and German accros all

three word classes

◮ DE and IT SimLex999 [Leviant and Reichart, 2015] 12 / 13

slide-22
SLIDE 22

Take-Home Messages

◮ Different word classes require different (finer-grained) context

configurations

◮ An automatic framework for computationally tractable selection

  • f optimal context configurations

◮ Design based on Universal Dependencies: context configurations

transferable to other tasks and languages without retraining

◮ Future work → finer-grained contexts, other word classes, more

sophisticated search algorithms, other representation models, context weighting, ...

13 / 13

slide-23
SLIDE 23

Take-Home Messages

◮ Different word classes require different (finer-grained) context

configurations

◮ An automatic framework for computationally tractable selection

  • f optimal context configurations

◮ Design based on Universal Dependencies: context configurations

transferable to other tasks and languages without retraining

◮ Future work → finer-grained contexts, other word classes, more

sophisticated search algorithms, other representation models, context weighting, ...

13 / 13

Thank you!

slide-24
SLIDE 24

References I

Hill, F., Reichart, R., and Korhonen, A. (2015). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics. Leviant, I. and Reichart, R. (2015). Judgment language matters: Multilingual vector space models for judgment language aware lexical semantics. arxiv:1508.00106. Levy, O. and Goldberg, Y. (2014). Dependency-based word embeddings. In Proc. of ACL. Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proc. of ACL. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781. Schwartz, R., Reichart, R., and Rappoport, A. (2015). Symmetric pattern based word embeddings for improved word similarity prediction. In Proc. of CoNLL.

1 / 2

slide-25
SLIDE 25

References II

Schwartz, R., Reichart, R., and Rappoport, A. (2016). Symmetric patterns and coordinations: Fast and enhanced representations of verbs and adjectives. In Proc. of NAACL.

2 / 2