An Approximate Perspective on Word Prediction in Context: Ontological Semantics meets BERT
Kanishka Misra and Julia Taylor Rayz Purdue University NAFIPS 2020
Virtually, from West Lafayette, IN, USA
An Approximate Perspective on Word Prediction in Context: - - PowerPoint PPT Presentation
An Approximate Perspective on Word Prediction in Context: Ontological Semantics meets BERT Kanishka Misra and Julia Taylor Rayz Purdue University NAFIPS 2020 Virtually, from West Lafayette, IN, USA Summary and Takeaways Neural Networks
Virtually, from West Lafayette, IN, USA
Misra and Rayz, 2020
2
Misra and Rayz, 2020
Process of training a Neural Network on large texts. Usually using a Language Modelling objective
3
Participants predict blank words in a sentence by relying on the context surrounding the blank.
Trainable parameters Hidden state (representations useful for NL tasks)
For a sequence of length T: word
Misra and Rayz, 2020
4
(Figure from Vaswani et al., 2017)
Devlin et al., 2019
Misra and Rayz, 2020
5
P(fragile | (1)) > P(fragile | (2))
(Ettinger, 2020; Petroni et al., 2019; Misra et al., 2020)
Misra and Rayz, 2020
6
(Ettinger, 2020; Kassner and Shutze, 2020)
To what extent does BERT understand Natural Language?
Misra and Rayz, 2020
P(football|context) > P(chess|context) [~75% accuracy]
7
Misra and Rayz, 2020
8
Misra and Rayz, 2020
9
Misra and Rayz, 2020
10
Ontology
morphology phonology syntax lexicon
Onomasticon
Commonsense Repo
Taylor, Raskin, Hempelmann (2010); Hempelmann, Raskin, Taylor (2010); Raskin, Hempelmann, Taylor (2010)
Misra and Rayz, 2020
11
INGEST-1 AGENT: sem: ANIMAL relaxable-to: SOCIAL-OBJECT THEME: sem: FOOD, BEVERAGE relaxable-to: ANIMAL, PLANT not: HUMAN
Taylor and Raskin (2010, 2011, 2016)
Misra and Rayz, 2020
12
Calculation of μ : Taylor and Raskin (2010, 2011, 2016); Taylor, Raskin and Hempelmann (2011)
Misra and Rayz, 2020
13
x
WASH: THEME: default: NONE rel-to: physical-object WASH: INSTRUMENT: laundry-detergent THEME: default: clothes rel-to: physical-object WASH: INSTRUMENT: soap THEME: default: NONE rel-to: physical-object
y z descendent virtual-nodes
Misra and Rayz, 2020
14
(the new curtains)
Misra and Rayz, 2020
15
Misra and Rayz, 2020
16
Misra and Rayz, 2020
17
BRUSH: AGENT: HUMAN GENDER: FEMALE THEME: [MASK] INSTRUMENT: NONE 1. Act of cleaning [brush your teeth] 2. Rub with brush [I brushed my clothes] 3. Remove with brush [brush dirt off the jacket] 4. Touch something lightly [her cheeks brushed against the wind] 5. ...
Misra and Rayz, 2020
18
Rank Token Probability 1 teeth 0.8915 2 hair 0.1073 3 face 0.0002 4 ponytail 0.0002 5 dress 0.0001
Misra and Rayz, 2020
19
Misra and Rayz, 2020
20
She quickly got dressed and brushed her [MASK] with a comb.
21
She quickly got dressed and brushed her [MASK] with a toothbrush.
BRUSH: AGENT: HUMAN GENDER: FEMALE THEME: [MASK] INSTRUMENT: COMB BRUSH: AGENT: HUMAN GENDER: FEMALE THEME: [MASK] INSTRUMENT: TOOTHBRUSH BRUSH B’1 B’2
She quickly got dressed and brushed her [MASK] with a comb.
22
She quickly got dressed and brushed her [MASK] with a toothbrush.
BRUSH BRUSH-WITH- INSTRUMENT
She quickly got dressed and brushed her [MASK] with a comb.
23
She quickly got dressed and brushed her [MASK] with a toothbrush.
Rank Token Probability 1 hair 0.8704 2 teeth 0.1059 3 face 0.0210 12 ponytail <0.0001 27 dress <0.0001 Rank Token Probability 1 teeth 0.9922 2 hair 0.0052 3 face 0.0019 31 ponytail <0.0001 98 dress <<0.0001
BRUSH-WITH- INSTRUMENT
Misra and Rayz, 2020
24
Misra and Rayz, 2020
○ needs large scale empirical testing by collecting events and their defaults.
○ Hypothesis: Softmax isn’t set up to learn multiple-labels per sample. ○ Especially when limited instances of the same event are encountered in training.
25
26