deep bidirectional transformers for
play

Deep Bidirectional Transformers for Language Understanding Source : - PowerPoint PPT Presentation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Source : NAACL-HLT 2019 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2019/09/02 CONTENTS Introduction Conclusion Method 1 5 3 4 2 Experiment


  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Source : NAACL-HLT 2019 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2019/09/02

  2. CONTENTS Introduction Conclusion Method 1 5 3 4 2 Experiment Related Work

  3. 1 Introduction

  4. Introduction B idirectional E ncoder R epresentations from T ransformers Language Model π‘ˆ ሻ 𝑄 π‘₯ 1 , π‘₯ 2 , … , π‘₯ π‘ˆ = ΰ·‘ 𝑄(π‘₯ 𝑒 |π‘₯ 1 , π‘₯ 2 , … , π‘₯ π‘’βˆ’1 𝑒=1 Pre-trained Language Model BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  5. 2 Related Work

  6. Related Work Pre-trained Language Model : ELMo Feature-based Fine-tuning : OpenAI GPT

  7. Related Work Pre-trained Language Model : ELMo Feature-based Fine-tuning : OpenAI GPT 1. Unidirectional language model 2. Same objective function B idirectional E ncoder R epresentations from T ransformers Masked Language Models (MLM) Next Sentence Prediction (NSP)

  8. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Sequence2sequence Encoder Decoder RNN : hard to parallel

  9. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Encoder-Decoder

  10. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Encoder-Decoder *6 Self-attention layer can be parallelly computed

  11. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Self-Attention query (to match others) key (to be matched) information to be extracted

  12. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Multi-Head Attention

  13. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017)

  14. B idirectional E ncoder R epresentations BERT from T ransformers BERT BASE (L=12, H=768, A=12, Parameters=110M) BERT LARGE (L=24, H=1024, A=16, Parameters=340M) 4H L A

  15. 3 Method

  16. Framework Pre-training : trained on unlabeled data over different pre-training tasks. Fine-Tuning : fine-tuned parameters using labeled data from the downstream tasks.

  17. Input [CLS] : classification token [SEP] : separate token Pre-training corpus : BooksCorpus 、 English Wikipedia Token Embedding : WordPiece embeddings with a 30,000 token vocabulary. Segment Embedding : Learned embeddings belong to sentence A or sentence B. Position Embedding : Learned positional embeddings.

  18. Pre-training Two unsupervised tasks: 1. Masked Language Models (MLM) 2. Next Sentence Prediction (NSP)

  19. Task1. MLM Masked Language Models Replace the token with (1) the [MASK] token 80% of the time. (2) a random token 10% of the time. (3) the unchanged i-th token 10% of the time. Mask 15% of all WordPiece tokens in each sequence at random for prediction. Hung-Yi Lee - BERT ppt

  20. Task2. NSP Next Sentence Prediction Input = [CLS] the man went to [MASK] store [SEP] he bought a gallon [MASK] milk [SEP] Label = IsNext Input = [CLS] the man [MASK] to the store [SEP] penguin [MASK] are flight ##less birds [SEP] Label = NotNext Hung-Yi Lee - BERT ppt

  21. Fine-Tuning Fine-Tuning : fine-tuned parameters using labeled data from the downstream tasks.

  22. Task 1 (b) Single Sentence Classification Tasks class Linear Trained from Input: single sentence, Classifier Scratch output: class Example: Sentiment analysis BERT Fine-tune Document Classification w 1 w 2 w 3 [CLS] sentence Hung-Yi Lee - BERT ppt

  23. Task 2 (d) Single Sentence Tagging Tasks class class class Linear Linear Linear Input: single sentence, Cls Cls Cls output: class of each word Example: Slot filling BERT w 1 w 2 w 3 [CLS] sentence Hung-Yi Lee - BERT ppt

  24. Task 3 (a) Sentence Pair Classification Tasks Input: two sentences, Class output: class Linear Example: Natural Language Inference Classifier BERT w 3 w 4 w 5 w 1 w 2 [CLS] [SEP] Sentence 1 Sentence 2 Hung-Yi Lee - BERT ppt

  25. Task 4 (c) Question Answering Tasks 17 Document : 𝐸 = 𝑒 1 , 𝑒 2 , β‹― , 𝑒 𝑂 Query : 𝑅 = π‘Ÿ 1 , π‘Ÿ 2 , β‹― , π‘Ÿ 𝑂 77 79 𝑑 𝐸 QA 𝑓 𝑅 Model 𝑑 = 17, 𝑓 = 17 output: two integers ( 𝑑 , 𝑓 ) 𝐡 = π‘Ÿ 𝑑 , β‹― , π‘Ÿ 𝑓 Answer : 𝑑 = 77, 𝑓 = 79 Hung-Yi Lee - BERT ppt

  26. Task 4 (c) Question Answering Tasks 0.3 0.2 0.5 Learned from scratch Softmax s = 2, e = 3 The answer is β€œ d 2 d 3 ”. dot product BERT d 1 d 2 d 3 q 1 q 2 [CLS] [SEP] question document Hung-Yi Lee - BERT ppt

  27. Task 4 (c) Question Answering Tasks 0.1 0.2 0.7 Learned from scratch Softmax s = 2, e = 3 The answer is β€œ d 2 d 3 ”. dot product BERT d 1 d 2 d 3 q 1 q 2 [CLS] [SEP] question document Hung-Yi Lee - BERT ppt

  28. 4 Experiment

  29. Experiments Fine-tuning results on 11 NLP tasks

  30. Implements LeeMeng- ι€²ζ“Šηš„ BERT (Pytorch)

  31. Implements LeeMeng- ι€²ζ“Šηš„ BERT (Pytorch)

  32. Implements LeeMeng- ι€²ζ“Šηš„ BERT (Pytorch)

  33. Implements LeeMeng- ι€²ζ“Šηš„ BERT (Pytorch)

  34. Implements LeeMeng- ι€²ζ“Šηš„ BERT (Pytorch)

  35. 5 Conclusion

  36. References BERT http://bit.ly/BERTpaper θͺžθ¨€ζ¨‘εž‹η™Όε±• http://bit.ly/nGram2NNLM θͺžθ¨€ζ¨‘εž‹ι θ¨“η·΄ζ–Ήζ³• http://bit.ly/ELMo_OpenAIGPT_BERT Attention Is All You Need http://bit.ly/AttIsAllUNeed ζŽεΌ˜ζ―… -Transformer(Youtube) http://bit.ly/HungYiLee_Transformer Illustrated Transformer http://bit.ly/illustratedTransformer 詳解 Transformer http://bit.ly/explainTransformer github/codertimo - BERT(pytorch) http://bit.ly/BERT_pytorch Pytorch.org_BERT http://bit.ly/pytorchorgBERT ε―¦δ½œε‡ζ–°θžεˆ†ι‘ž http://bit.ly/implementpaircls

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend