CSC413/2516 Lecture 8: Attention and Transformers
Jimmy Ba
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 1 / 50
CSC413/2516 Lecture 8: Attention and Transformers Jimmy Ba Jimmy - - PowerPoint PPT Presentation
CSC413/2516 Lecture 8: Attention and Transformers Jimmy Ba Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 1 / 50 Overview We have seen a few RNN-based sequence prediction models. It is still challenging to generate long
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 1 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 2 / 50
For the full text samples see Radford, Alec, et al. ”Language Models are Unsupervised Multitask Learners.” 2019. https://talktotransformer.com/ Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 3 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 4 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 5 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 6 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 7 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 8 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 9 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 10 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 11 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 12 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 13 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 14 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 15 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 16 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 17 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 18 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 19 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 20 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 20 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 21 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 22 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 23 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 24 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 25 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 26 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 27 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 28 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 29 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 30 / 50
https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 31 / 50
Vaswani, Ashish, et al. ”Attention is all you need.” Advances in Neural Information Processing Systems. 2017. Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 32 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 33 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 34 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 35 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 36 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 37 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 38 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 39 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 40 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 41 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 42 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 43 / 50
Radford, Alec, et al. ”Improving Language Understanding by Generative Pre-Training.” 2018. Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 44 / 50
For the full text samples see Radford, Alec, et al. ”Language Models are Unsupervised Multitask Learners.” 2019. Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 45 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 46 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 47 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 48 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 49 / 50
Jimmy Ba CSC413/2516 Lecture 8: Attention and Transformers 50 / 50