CSC421/2516 Lecture 16: Attention
Roger Grosse and Jimmy Ba
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 1 / 39
CSC421/2516 Lecture 16: Attention Roger Grosse and Jimmy Ba Roger - - PowerPoint PPT Presentation
CSC421/2516 Lecture 16: Attention Roger Grosse and Jimmy Ba Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 1 / 39 Overview We have seen a few RNN-based sequence prediction models. It is still challenging to generate long
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 1 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 2 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 3 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 4 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 5 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 6 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 7 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 8 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 9 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 10 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 11 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 12 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 13 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 14 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 15 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 16 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 17 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 18 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 19 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 20 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 21 / 39
Vaswani, Ashish, et al. ”Attention is all you need.” Advances in Neural Information Processing Systems. 2017. https://arxiv.org/pdf/1706.03762.pdf Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 22 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 23 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 24 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 25 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 26 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 27 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 28 / 39
https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 29 / 39
Vaswani, Ashish, et al. ”Attention is all you need.” Advances in Neural Information Processing Systems. 2017. Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 30 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 31 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 32 / 39
Radford, Alec, et al. ”Improving Language Understanding by Generative Pre-Training.” 2018. Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 33 / 39
For the full text samples see Radford, Alec, et al. ”Language Models are Unsupervised Multitask Learners.” 2019. Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 34 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 35 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 36 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 37 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 38 / 39
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 16: Attention 39 / 39