SLIDE 1
Attention - tl;dr
Pay attention to a weighted combination of input states to generate the right output state
2
IN5550: Neural Methods in Natural Language Processing IN5550 - - PowerPoint PPT Presentation
IN5550: Neural Methods in Natural Language Processing IN5550 Neural Methods in Natural Language Processing Transformers Jeremy Barnes University of Oslo March 31, 2020 Attention - tl;dr Pay attention to a weighted combination of input
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
*from Strubel et al. (2019) Energy and policy considerations for deep learning in NLP. 29
30
31
32
33
This is an example.
Pretrained T eacher Model
This is an example.
Pretrained T eacher Model Student Model Model Model 34
Pretrained T eacher Model
Performance = 93.7 Performance = 93.7 Performance = 93.0
36