Transformers Pre-trained Language Models
LING572 Advanced Statistical Methods for NLP March 10, 2020
1
Transformers Pre-trained Language Models LING572 Advanced - - PowerPoint PPT Presentation
Transformers Pre-trained Language Models LING572 Advanced Statistical Methods for NLP March 10, 2020 1 Announcements Thanks for being here! Please be active on Zoom chat! Thats the only form of interaction; I wont be able to
LING572 Advanced Statistical Methods for NLP March 10, 2020
1
2
3
4
5
Paper link (but see Annotated and Illustrated Transformer)
6
7
7
Single layer, applied to each position
7
Single layer, applied to each position
8
j
8
j
8
j
9
9
9
10
10
10
11
11
11
source
11
source
less generalization.
11
source
12
12
More on why important later
13
source
14
15
16
link
17
CVPR ‘09
18
18
19 source
19 source
“We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of
Astonishingly, we report consistent superior results compared to the highly tuned state-of-the- art systems in all the visual classification tasks on various datasets”
20
21
21
21
21
22
23
23
23
23
23
23
23
23
23
23
23
23
23
24
25
CVPR ’17 paper
25
CVPR ’17 paper
Pre-trained ResNet
26
27
27
27
27
27
27
27
27
27
28
28
28
28
29
30
30
30
31
32
https://twitter.com/rgblong/status/916062474545319938?lang=en
33
34
35
Peters et. al (2018)
36
Peters et. al (2018)
36
Peters et. al (2018)
36
Peters et. al (2018)
37
Source Nearest Neighbors GloVe
play playing, game, games, played, players, plays, player, Play, football, multiplayer
biLM
Chico Ruiz made a spectacular play on Alusik’s grounder… Kieffer, the only junior in the group, was commended for his ability to hit in the clutch, as well as his all-round excellent play. Olivia De Havilland signed to do a Broadway play for Garson… …they were actors who had been handed fat roles in a successful play, and had talent enough to fill the roles competently, with nice understatement.
Peters et. al (2018)
38
SQuAD = Stanford Question Answering Dataset SNLI = Stanford Natural Language Inference Corpus SST
figure: Matthew Peters
Devlin et al NAACL 2019
39
40
41
42
43
43
43
44
44
44
44
44
corpus is minimized
45
46
47
48
49
https://www.blog.google/products/search/search-language-understanding-bert/
50
51
General Language Understanding Evaluation (GLUE) / SuperGLUE
52
52
52
52
52
52
52
53
53
54
54
54
55
55
Mikolov et al 2013a (the OG word2vec paper)
56
57