Deep Bidirectional Transformers for Language Understanding Source : - - PowerPoint PPT Presentation

deep bidirectional transformers for
SMART_READER_LITE
LIVE PREVIEW

Deep Bidirectional Transformers for Language Understanding Source : - - PowerPoint PPT Presentation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Source : NAACL-HLT 2019 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2019/09/02 CONTENTS Introduction Conclusion Method 1 5 3 4 2 Experiment


slide-1
SLIDE 1

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Source : NAACL-HLT 2019 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2019/09/02

slide-2
SLIDE 2

CONTENTS

1

Introduction Related Work Method Experiment Conclusion

2 3 4 5

slide-3
SLIDE 3

1

Introduction

slide-4
SLIDE 4

Introduction

Bidirectional Encoder Representations from Transformers Language Model

𝑄 𝑥1, 𝑥2, … , 𝑥𝑈 = ෑ

𝑢=1 𝑈

ሻ 𝑄(𝑥𝑢|𝑥1, 𝑥2, … , 𝑥𝑢−1

Pre-trained Language Model

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

slide-5
SLIDE 5

2

Related Work

slide-6
SLIDE 6

Related Work

Pre-trained Language Model Feature-based Fine-tuning : ELMo : OpenAI GPT

slide-7
SLIDE 7

Related Work

Pre-trained Language Model Feature-based Fine-tuning : ELMo : OpenAI GPT

  • 1. Unidirectional language model
  • 2. Same objective function

Bidirectional Encoder Representations from Transformers

Masked Language Models (MLM) Next Sentence Prediction (NSP)

slide-8
SLIDE 8

《Attention is all you need》 Vaswani et al. (NIPS2017)

Transformers

Encoder Decoder Sequence2sequence RNN : hard to parallel

slide-9
SLIDE 9

Encoder-Decoder

《Attention is all you need》 Vaswani et al. (NIPS2017)

Transformers

slide-10
SLIDE 10

《Attention is all you need》 Vaswani et al. (NIPS2017)

Transformers

Encoder-Decoder *6

Self-attention layer can be parallelly computed

slide-11
SLIDE 11

《Attention is all you need》 Vaswani et al. (NIPS2017)

Transformers

Self-Attention

query (to match others) key (to be matched)

information to be extracted

slide-12
SLIDE 12

《Attention is all you need》 Vaswani et al. (NIPS2017)

Transformers

Multi-Head Attention

slide-13
SLIDE 13

Transformers

《Attention is all you need》 Vaswani et al. (NIPS2017)

slide-14
SLIDE 14

BERTBASE (L=12, H=768, A=12, Parameters=110M) BERTLARGE (L=24, H=1024, A=16, Parameters=340M) L A 4H

BERT

Bidirectional Encoder Representations from Transformers

slide-15
SLIDE 15

3

Method

slide-16
SLIDE 16

Framework

Pre-training : trained on unlabeled data over different pre-training tasks. Fine-Tuning : fine-tuned parameters using labeled data from the downstream tasks.

slide-17
SLIDE 17

Input

Token Embedding : WordPiece embeddings with a 30,000 token vocabulary. [CLS] : classification token [SEP] : separate token Segment Embedding : Learned embeddings belong to sentence A or sentence B. Position Embedding : Learned positional embeddings. Pre-training corpus : BooksCorpus、English Wikipedia

slide-18
SLIDE 18

Pre-training

Two unsupervised tasks:

  • 1. Masked Language Models (MLM)
  • 2. Next Sentence Prediction (NSP)
slide-19
SLIDE 19
  • Task1. MLM

Masked Language Models

Hung-Yi Lee - BERT ppt

Mask 15% of all WordPiece tokens in each sequence at random for prediction. Replace the token with (1) the [MASK] token 80% of the time. (2) a random token 10% of the time. (3) the unchanged i-th token 10% of the time.

slide-20
SLIDE 20
  • Task2. NSP

Next Sentence Prediction

Hung-Yi Lee - BERT ppt

Input = [CLS] the man went to [MASK] store [SEP] he bought a gallon [MASK] milk [SEP] Label = IsNext Input = [CLS] the man [MASK] to the store [SEP] penguin [MASK] are flight ##less birds [SEP] Label = NotNext

slide-21
SLIDE 21

Fine-Tuning

Fine-Tuning : fine-tuned parameters using labeled data from the downstream tasks.

slide-22
SLIDE 22

Task 1 (b)

BERT

[CLS] w1 w2 w3 Linear Classifier class Input: single sentence,

  • utput: class

sentence Example: Sentiment analysis Document Classification Trained from Scratch Fine-tune

Hung-Yi Lee - BERT ppt

Single Sentence Classification Tasks

slide-23
SLIDE 23

BERT

[CLS] w1 w2 w3 Linear Cls class Input: single sentence,

  • utput: class of each word

sentence Example: Slot filling Linear Cls class Linear Cls class

Hung-Yi Lee - BERT ppt

Task 2 (d)

Single Sentence Tagging Tasks

slide-24
SLIDE 24

Linear Classifier w1 w2

BERT

[CLS] [SEP] Class Sentence 1 Sentence 2 w3 w4 w5 Input: two sentences,

  • utput: class

Example: Natural Language Inference

Hung-Yi Lee - BERT ppt

Task 3 (a)

Sentence Pair Classification Tasks

slide-25
SLIDE 25

𝐸 = 𝑒1, 𝑒2, ⋯ , 𝑒𝑂 𝑅 = 𝑟1, 𝑟2, ⋯ , 𝑟𝑂 QA Model

  • utput: two integers (𝑡, 𝑓)

𝐵 = 𝑟𝑡, ⋯ , 𝑟𝑓 Document: Query: Answer: 𝐸 𝑅 𝑡 𝑓

17 77 79

𝑡 = 17, 𝑓 = 17 𝑡 = 77, 𝑓 = 79

Hung-Yi Lee - BERT ppt

Task 4 (c)

Question Answering Tasks

slide-26
SLIDE 26

q1 q2

BERT

[CLS] [SEP] question document d1 d2 d3 dot product Softmax 0.5 0.3 0.2 The answer is “d2 d3”. s = 2, e = 3 Learned from scratch

Hung-Yi Lee - BERT ppt

Task 4 (c)

Question Answering Tasks

slide-27
SLIDE 27

q1 q2

BERT

[CLS] [SEP] question document d1 d2 d3 The answer is “d2 d3”. s = 2, e = 3 Learned from scratch

Hung-Yi Lee - BERT ppt

dot product Softmax 0.2 0.1 0.7

Task 4 (c)

Question Answering Tasks

slide-28
SLIDE 28

Experiment

4

slide-29
SLIDE 29

Experiments

Fine-tuning results on 11 NLP tasks

slide-30
SLIDE 30

Implements

LeeMeng-進擊的BERT (Pytorch)

slide-31
SLIDE 31

Implements

LeeMeng-進擊的BERT (Pytorch)

slide-32
SLIDE 32

Implements

LeeMeng-進擊的BERT (Pytorch)

slide-33
SLIDE 33

Implements

LeeMeng-進擊的BERT (Pytorch)

slide-34
SLIDE 34

Implements

LeeMeng-進擊的BERT (Pytorch)

slide-35
SLIDE 35

5 Conclusion

slide-36
SLIDE 36

References

語言模型發展 http://bit.ly/nGram2NNLM 語言模型預訓練方法 http://bit.ly/ELMo_OpenAIGPT_BERT Attention Is All You Need http://bit.ly/AttIsAllUNeed BERT http://bit.ly/BERTpaper 李弘毅-Transformer(Youtube) http://bit.ly/HungYiLee_Transformer Illustrated Transformer http://bit.ly/illustratedTransformer 詳解Transformer http://bit.ly/explainTransformer github/codertimo - BERT(pytorch) http://bit.ly/BERT_pytorch 實作假新聞分類 http://bit.ly/implementpaircls Pytorch.org_BERT http://bit.ly/pytorchorgBERT