Word, Sense and Contextualized Embeddings: Vector Representations of - - PowerPoint PPT Presentation

word sense and contextualized embeddings vector
SMART_READER_LITE
LIVE PREVIEW

Word, Sense and Contextualized Embeddings: Vector Representations of - - PowerPoint PPT Presentation

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP Jose Camacho-Collados Cardiff University, 18 March 2019 1 Outline Background Vector Space Models (word embeddings) Lexical resources Sense


slide-1
SLIDE 1

Jose Camacho-Collados

Cardiff University, 18 March 2019

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP

1

slide-2
SLIDE 2

Outline

2

❖ Background

➢ Vector Space Models (word embeddings) ➢ Lexical resources

❖ Sense representations ➢ Knowledge-based: NASARI, SW2V ➢ Contextualized: ELMo, BERT ❖ Applications

slide-3
SLIDE 3

Word vector space models

3

Words are represented as vectors: semantically similar words are close in the vector space

slide-4
SLIDE 4

Neural networks for learning word vector representations from text corpora -> word embeddings

4

slide-5
SLIDE 5

Why word embeddings?

Embedded vector representations:

  • are compact and fast to compute
  • preserve important relational information between

words (actually, meanings):

  • are geared towards general use

5

slide-6
SLIDE 6
  • Syntactic parsing (Weiss et al. 2015)
  • Named Entity Recognition (Guo et al. 2014)
  • Question Answering (Bordes et al. 2014)
  • Machine Translation (Zou et al. 2013)
  • Sentiment Analysis (Socher et al. 2013)

… and many more!

6

Applications for word representations

slide-7
SLIDE 7

AI goal: language understanding

7

slide-8
SLIDE 8
  • Word representations cannot capture ambiguity. For

instance, bank

8

Limitations of word embeddings

slide-9
SLIDE 9

9

Problem 1: word representations cannot capture ambiguity

slide-10
SLIDE 10

07/07/2016

10

Problem 1: word representations cannot capture ambiguity

slide-11
SLIDE 11

11

Problem 1: word representations cannot capture ambiguity

slide-12
SLIDE 12

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

12

Word representations and the triangular inequality

Example from Neelakantan et al (2014) plant pollen refinery

slide-13
SLIDE 13

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

13

Example from Neelakantan et al (2014) plant1 pollen refinery plant2

Word representations and the triangular inequality

slide-14
SLIDE 14
  • They cannot capture ambiguity. For instance,

bank

  • > They neglect rare senses and infrequent words
  • Word representations do not exploit knowledge from

existing lexical resources.

14

Limitations of word representations

slide-15
SLIDE 15

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

Motivation: Model senses instead

  • f only words

He withdrew money from the bank.

slide-16
SLIDE 16

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

Motivation: Model senses instead

  • f only words

... ...

He withdrew money from the bank.

bank#1 bank#2

slide-17
SLIDE 17

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

Motivation: Model senses instead

  • f only words

... ...

He withdrew money from the bank.

bank#1 bank#2

slide-18
SLIDE 18

18

a Novel Approach to a Semantically-Aware Representations of Items

http://lcl.uniroma1.it/nasari/

slide-19
SLIDE 19

19

Key goal: obtain sense representations

slide-20
SLIDE 20

20

Key goal: obtain sense representations

We want to create a separate representation for each entry of a given word

slide-21
SLIDE 21

WordNet

Idea

21

+

Encyclopedic knowledge Lexicographic knowledge

slide-22
SLIDE 22

WordNet

Idea

22

+

Encyclopedic knowledge Lexicographic knowledge

+

Information from text corpora

slide-23
SLIDE 23

23

WordNet

slide-24
SLIDE 24

WordNet

Main unit: synset (concept)

electronic device television, telly, television set, tv, tube, tv set, idiot box, boob tube, goggle box the middle of the day Noon, twelve noon, high noon, midday, noonday, noontide

24

synset word sense

slide-25
SLIDE 25

the branch of biology that studies plants botany

WordNet semantic relations

((botany) a living

  • rganism lacking

the power of locomotion plant, flora, plant life a living thing that has (or can develop) the ability to act or function independently

  • rganism, being

any of a variety of plants grown indoors for decorative purposes houseplant a protective covering that is part of a plant hood, cap

Hypernymy (is-a) Domain Hyponymy (has-kind) M e r

  • n

y m y ( p a r t

  • f

)

25

slide-26
SLIDE 26

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

Knowledge-based Representations (WordNet)

  • X. Chen, Z. Liu, M. Sun: A Unified Model for Word Sense Representation and

Disambiguation (EMNLP 2014)

  • S. Rothe and H. Schutze: AutoExtend: Extending Word Embeddings to

Embeddings for Synsets and Lexemes (ACL 2015) Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. Retrofitting Word Vectors to Semantic Lexicons (NAACL 2015)*

  • S. K. Jauhar, C. Dyer, E. Hovy: Ontologically Grounded Multi-sense

Representation Learning for Semantic Vector Space Models (NAACL 2015)

  • M. T. Pilehvar and N. Collier, De-Conflated Semantic Representations (EMNLP

2016)

26

slide-27
SLIDE 27

27

Wikipedia

slide-28
SLIDE 28

28

Wikipedia

High coverage of named entities and specialized concepts from different domains

slide-29
SLIDE 29

29

Wikipedia hyperlinks

slide-30
SLIDE 30

30

Wikipedia hyperlinks

slide-31
SLIDE 31

Thanks to an automatic mapping algorithm, BabelNet integrates Wikipedia and WordNet, among other resources (Wiktionary, OmegaWiki, WikiData…). Key feature: Multilinguality (271 languages)

31

slide-32
SLIDE 32

32

BabelNet

Concept Entity

slide-33
SLIDE 33

It follows the same structure of WordNet: synsets are the main units

33

BabelNet

slide-34
SLIDE 34

In this case, synsets are multilingual

34

BabelNet

slide-35
SLIDE 35

35

NASARI

(Camacho-Collados et al., AIJ 2016)

Goal

Build vector representations for multilingual BabelNet synsets.

How?

We exploit Wikipedia semantic network and WordNet taxonomy to construct a subcorpus (contextual information) for any given BabelNet synset.

slide-36
SLIDE 36

36

Process of obtaining contextual information for a BabelNet synset exploiting BabelNet taxonomy and Wikipedia as a semantic network

Pipeline

slide-37
SLIDE 37

37

Three types of vector representations:

  • Lexical (dimensions are words)
  • Unified (dimensions are multilingual BabelNet synsets)
  • Embedded (latent dimensions)

Three types of vector representations

slide-38
SLIDE 38

38

Three types of vector representations:

  • Lexical (dimensions are words)
  • Unified (dimensions are multilingual BabelNet synsets)
  • Embedded (latent dimensions)

Three types of vector representations

}

slide-39
SLIDE 39

Human-interpretable dimensions

plant (living organism)

  • rganism#1

table#3 tree#1 leaf#1 4 soil#2 c a r p e t # 2 food#2 garden#2 dictionary#3 refinery#1

39

slide-40
SLIDE 40

40

Three types of vector representations:

  • Lexical (dimensions are words)
  • Unified (dimensions are multilingual BabelNet synsets)
  • Embedded: Low-dimensional vectors exploiting word embeddings
  • btained from text corpora.

Three types of vector representations

slide-41
SLIDE 41

41

Three types of vector representations:

  • Lexical (dimensions are words)
  • Unified (dimensions are multilingual BabelNet synsets)
  • Embedded:

Low-dimensional vectors exploiting word embeddings obtained from text corpora.

  • Word and synset embeddings share the same vector space!

Three types of vector representations

slide-42
SLIDE 42

42

Embedded vector representation

Closest senses

slide-43
SLIDE 43

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

43

SW2V

(Mancini and Camacho-Collados et al., CoNLL 2017)

A word is the surface form of a sense: we can exploit this intrinsic relationship for jointly training word and sense embeddings.

slide-44
SLIDE 44

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

44

SW2V

(Mancini and Camacho-Collados et al., CoNLL 2017)

A word is the surface form of a sense: we can exploit this intrinsic relationship for jointly training word and sense embeddings.

How?

Updating the representation of the word and its associated senses interchangeably.

slide-45
SLIDE 45

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

45

SW2V: Idea

Given as input a corpus and a semantic network: 1. Use a semantic network to link to each word its associated senses in context. He withdrew money from the bank.

slide-46
SLIDE 46

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

46

Given as input a corpus and a semantic network: 1. Use a semantic network to link to each word its associated senses in context. He withdrew money from the bank.

SW2V: Idea

slide-47
SLIDE 47

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

47

Given as input a corpus and a semantic network: 1. Use a semantic network to link to each word its associated senses in context. 2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

SW2V: Idea

slide-48
SLIDE 48

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

48

Given as input a corpus and a semantic network: 1. Use a semantic network to link to each word its associated senses in context. 2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

He bank money withdrew the from

SW2V: Idea

slide-49
SLIDE 49

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

49

Given as input a corpus and a semantic network: 1. Use a semantic network to link to each word its associated senses in context. 2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

He bank money withdrew the from error

SW2V: Idea

slide-50
SLIDE 50

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

50

Given as input a corpus and a semantic network: 1. Use a semantic network to link to each word its associated senses in context. 2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

He bank money withdrew the from error

SW2V: Idea

slide-51
SLIDE 51

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

51

Given as input a corpus and a semantic network: 1. Use a semantic network to link to each word its associated senses in context. 2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

He bank money withdrew the from error

SW2V: Idea

slide-52
SLIDE 52

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

52

Given as input a corpus and a semantic network: 1. Use a semantic network to link to each word its associated senses in context. 2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

He bank money withdrew the from error

SW2V: Idea

slide-53
SLIDE 53

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

53

Given as input a corpus and a semantic network: 1. Use a semantic network to link to each word its associated senses in context. 2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections. In this way it is possible to learn word and sense/synset embeddings jointly on a single training.

SW2V: Idea

slide-54
SLIDE 54

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

54

Full architecture of W2V (Mikolov et al., 2013)

Words and associated senses used both as input and output. E=-log(p(wt|Wt))

slide-55
SLIDE 55

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

55

Words and associated senses used both as input and output.

Full architecture of SW2V

E=-log(p(wt|Wt,St)) - ∑s∈St log(p(s|Wt,St))

slide-56
SLIDE 56

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

56

Word and senses connectivity: example 1

Ten closest word and sense embeddings to the sense company (military unit)

slide-57
SLIDE 57

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

57

Word and senses connectivity: example 2

Ten closest word and sense embeddings to the sense school (group of fish)

slide-58
SLIDE 58

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

58

  • Taxonomy Learning (Espinosa-Anke et al., EMNLP 2016)
  • Open Information Extraction (Delli Bovi et al. EMNLP 2015).
  • Lexical entailment (Nickel & Kiela, NIPS 2017)
  • Word Sense Disambiguation (Rothe & Schütze, ACL 2015)
  • Sentiment analysis (Flekova & Gurevych, ACL 2016)
  • Lexical substitution (Cocos et al., SENSE 2017)
  • Computer vision (Young et al. ICRA 2017)

...

Applications of knowledge-based sense representations

slide-59
SLIDE 59

59

❖ Domain labeling/adaptation ❖ Word Sense Disambiguation ❖ Downstream NLP applications (e.g. text classification)

Applications

slide-60
SLIDE 60

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

60

Annotate each concept/entity with its corresponding domain of knowledge. To this end, we use the Wikipedia featured articles page, which includes 34 domains and a number of Wikipedia pages associated with each domain (Biology, Geography, Mathematics, Music, etc. ).

Domain labeling

(Camacho-Collados and Navigli, EACL 2017)

slide-61
SLIDE 61

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

Wikipedia featured articles

Domain labeling

slide-62
SLIDE 62

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language Processing Camacho-Collados, Espinosa-Anke, Pilehvar

62

How to associate a concept with a domain?

1. Learn a NASARI vector for the concatenation of all Wikipedia pages associated with a given domain. 2. Exploit the semantic similarity between knowledge-based vectors and graph properties of the lexical resources.

Domain labeling

slide-63
SLIDE 63

63

BabelDomains

(Camacho-Collados and Navigli, EACL 2017)

As a result: Unified resource with information about domains of knowledge

BabelDomains available for BabelNet, Wikipedia and WordNet available at http://lcl.uniroma1.it/babeldomains Already integrated into BabelNet (online interface and API)

slide-64
SLIDE 64

64

BabelDomains

Physics and astronomy Computing Media

slide-65
SLIDE 65

65

Task: Given a term, predict its hypernym(s) Model: Distributional supervised system based on the transformation matrix of Mikolov et al. (2013). Idea: Training data filtered by domain of knowledge

Domain filtering for supervised distributional hypernym discovery

(Espinosa-Anke et al., EMNLP 2016; Camacho-Collados and Navigli, EACL 2017) Fruit Apple is a

slide-66
SLIDE 66

66

Domain filtering for supervised distributional hypernym discovery

Results on the hypernym discovery task for five domains

Conclusion: Filtering training data by domains prove to be clearly beneficial

Domain-filtered training data Non-filtered training data

slide-67
SLIDE 67

67

Kobe, which is one of Japan's largest cities, [...]

?

Word Sense Disambiguation

slide-68
SLIDE 68

68

Kobe, which is one of Japan's largest cities, [...]

X

Word Sense Disambiguation

slide-69
SLIDE 69

69

Kobe, which is one of Japan's largest cities, [...]

Word Sense Disambiguation

slide-70
SLIDE 70

70

Word Sense Disambiguation

(Camacho-Collados et al., AIJ 2016)

Basic idea

Select the sense which is semantically closer to the semantic representation of the whole document (global context).

slide-71
SLIDE 71

Word Sense Disambiguation

  • n textual definitions

(Camacho-Collados et al., LREC 2016; LREV 2018)

Combination of a graph-based disambiguation system (Babelfy) with NASARI to disambiguate the concepts and named entities of over 35M definitions in 256 languages.

Sense-annotated corpus freely available at http://lcl.uniroma1.it/disambiguated-glosses/

71

slide-72
SLIDE 72

Context-rich WSD

Interchanging the positions of the king and a rook.

castling (chess)

slide-73
SLIDE 73

Context-rich WSD

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks. A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Interchanging the positions of the king and a rook.

castling (chess)

slide-74
SLIDE 74

Context-rich WSD

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werden El enroque es un movimiento especial en el juego de ajedrez que involucra al rey y a una de las torres del jugador. A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king. Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž. Rok İngilizce'de kaleye rook denmektedir. Rokade er et spesialtrekk i sjakk. Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.

castling (chess)

slide-75
SLIDE 75

Context-rich WSD

75

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werden El enroque es un movimiento especial en el juego de ajedrez que involucra al rey y a una de las torres del jugador. A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king. Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž. Rok İngilizce'de kaleye rook denmektedir. Rokade er et spesialtrekk i sjakk. Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.

castling (chess)

slide-76
SLIDE 76

Context-rich WSD exploiting parallel corpora

(Delli Bovi et al., ACL 2017)

Applying the same method to provide high-quality sense annotations from parallel corpora (Europarl): 120M+ sense annotations for 21 languages. http://lcl.uniroma1.it/eurosense/ Extrinsic evaluation: Improved performance of a standard supervised WSD system using this automatically sense-annotated corpora.

slide-77
SLIDE 77

77

Towards a seamless integration of senses in downstream NLP applications

(Pilehvar et al., ACL 2017)

Question: What if we apply WSD and inject sense embeddings to a standard neural classifier?

Problems:

slide-78
SLIDE 78

78

Towards a seamless integration of senses in downstream NLP applications

(Pilehvar et al., ACL 2017)

Question: What if we apply WSD and inject sense embeddings to a standard neural classifier?

Problems:

  • WSD is not perfect
slide-79
SLIDE 79

79

Towards a seamless integration of senses in downstream NLP applications

(Pilehvar et al., ACL 2017)

Question: What if we apply WSD and inject sense embeddings to a standard neural classifier?

Problems:

  • WSD is not perfect -> Solution: High-confidence disambiguation
slide-80
SLIDE 80

80

Towards a seamless integration of senses in downstream NLP applications

(Pilehvar et al., ACL 2017)

Question: What if we apply WSD and inject sense embeddings to a standard neural classifier?

Problems:

  • WSD is not perfect -> Solution: High-confidence disambiguation
  • WordNet lacks coverage
slide-81
SLIDE 81

81

Towards a seamless integration of senses in downstream NLP applications

(Pilehvar et al., ACL 2017)

Question: What if we apply WSD and inject sense embeddings to a standard neural classifier?

Problems:

  • WSD is not perfect -> Solution: High-confidence disambiguation
  • WordNet lacks coverage -> Solution: Use of Wikipedia
slide-82
SLIDE 82

82

Tasks: Topic categorization and sentiment analysis (polarity detection)

Topic categorization: Given a text, assign it a topic (e.g. politics, sports, etc.). Polarity detection: Predict the sentiment of the sentence/review as either positive or negative.

slide-83
SLIDE 83

83

Classification model

Standard CNN classifier inspired by Kim (2014)

slide-84
SLIDE 84

84

Sense-based vs. word-based: Conclusions

Sense-based better than word-based... when the input text is large enough

slide-85
SLIDE 85

85

Sense-based vs. word-based:

Sense-based better than word-based... when the input text is large enough:

slide-86
SLIDE 86

86

Why does the input text size matter?

  • Word sense disambiguation works better in larger

texts (Moro et al. 2014; Raganato et al. 2017)

  • Disambiguation increases sparsity
slide-87
SLIDE 87

87

Contextualized word embeddings ELMo/BERT

Peters et al. (NAACL 2018) Devlin et al. (NAACL 2019)

slide-88
SLIDE 88

88

Contextualized word embeddings ELMo/BERT

slide-89
SLIDE 89

89

Contextualized word embeddings ELMo/BERT

As word embeddings, learned by leveraging language models on massive amounts of text corpora.

slide-90
SLIDE 90

90

Contextualized word embeddings ELMo/BERT

As word embeddings, learned by leveraging language models on massive amounts of text corpora. New: each word vector depends on the context. It is dynamic.

slide-91
SLIDE 91

91

Contextualized word embeddings ELMo/BERT

As word embeddings, learned by leveraging language models on massive amounts of text corpora. New: each word vector depends on the context. It is dynamic. Important improvements in many NLP tasks.

slide-92
SLIDE 92

92

Contextualized word embeddings ELMo/BERT (examples)

He withdrew money from the bank. The bank remained closed yesterday. We found a nice spot by the bank of the river.

slide-93
SLIDE 93

93

Contextualized word embeddings ELMo/BERT (examples)

He withdrew money from the bank. The bank remained closed yesterday. We found a nice spot by the bank of the river.

0.25, 0.32, -0.1 …. 0.22, 0.30, -0.08 ….

  • 0.8, 0.01, 0.3 ….
slide-94
SLIDE 94

94

Contextualized word embeddings ELMo/BERT (examples)

He withdrew money from the bank. The bank remained closed yesterday. We found a nice spot by the bank of the river.

0.25, 0.32, -0.1 …. 0.22, 0.30, -0.08 ….

  • 0.8, 0.01, 0.3 ….

Similar vectors

slide-95
SLIDE 95

95

How well do these models capture “meaning”?

Good enough for many applications. Room for improvement. No noticeable improvements in:

➢ Winograd Schema Challenge: BERT ~65% vs Humans ~95% ➢ Word-in-Context Challenge: BERT ~65% vs Humans ~85%

slide-96
SLIDE 96

96

How well do these models capture “meaning”?

Good enough for many applications. Room for improvement. No noticeable improvements in:

➢ Winograd Schema Challenge: BERT ~65% vs Humans ~95% ➢ Word-in-Context Challenge: BERT ~65% vs Humans ~85%

requires commonsense reasoning requires abstracting the notion of sense

slide-97
SLIDE 97

97

For more information on meaning representations (embeddings):

❖ ACL 2016 Tutorial on “Semantic representations of word senses and concepts”: http://josecamachocollados.com/slides/Slides_ACL16Tutorial_SemanticRep resentation.pdf ❖ EACL 2017 workshop on “Sense, Concept and Entity Representations and their Applications”: https://sites.google.com/site/senseworkshop2017/ ❖ NAACL 2018 Tutorial on “Interplay between lexical resources and NLP”: https://bitbucket.org/luisespinosa/lr-nlp/ ❖ “From Word to Sense Embeddings: A Survey on Vector Representations of Meaning” (JAIR 2018): https://www.jair.org/index.php/jair/article/view/11259

slide-98
SLIDE 98

98

Thank you! Questions please!

camachocolladosj@cardiff.ac.uk @CamachoCollados josecamachocollados.com