Data Augmentation in NLP 2020-03-21 Xiachong Feng Outline Why we - - PowerPoint PPT Presentation

β–Ά
data augmentation in nlp
SMART_READER_LITE
LIVE PREVIEW

Data Augmentation in NLP 2020-03-21 Xiachong Feng Outline Why we - - PowerPoint PPT Presentation

Data Augmentation in NLP 2020-03-21 Xiachong Feng Outline Why we need Data Augmentation? Data Augmentation in CV Widely Used Methods EDA Back-Translation Contextual Augmentation Methods based on Pre-trained Language


slide-1
SLIDE 1

Data Augmentation in NLP

Xiachong Feng

2020-03-21

slide-2
SLIDE 2

Outline

  • Why we need Data Augmentation?
  • Data Augmentation in CV
  • Widely Used Methods
  • EDA
  • Back-Translation
  • Contextual Augmentation
  • Methods based on Pre-trained Language Models.
  • BERT
  • GPT
  • Seq2Seq (BART)
  • Conclusion
slide-3
SLIDE 3

Why we need Data Augmentation?

  • Few-shot Learning
  • Imbalance labeled data
  • Semi-supervise Learning
  • ......

https://mp.weixin.qq.com/s/CHSDi2LpDOLMjWOLXlvSAg

slide-4
SLIDE 4

Data Augmentation in CV

Flip:flip images horizontally and vertically. Rotation Scale Crop : randomly sample a section from the original image Gaussian Noise

https://medium.com/nanonets/how-to-use- deep-learning-when-you-have-limited-data- part-2-data-augmentation-c26971dc8ced

slide-5
SLIDE 5

IF we apply them to NLP

Flip:flip horizontally and vertically. Crop : randomly sample a section

! you hate I I hate you ! I hate you ! I hate you ! I hate you !

Language is Discrete.

slide-6
SLIDE 6

Widely Used Methods

  • EDA
  • Back-Translation
  • Contextual Augmentation
slide-7
SLIDE 7

EDA

  • EDA: Easy Data Augmentation Techniques for Boosting

Performance on Text Classification Tasks 1. Synonym Replacement (SR): Randomly choose n words from the sentence that are not stop words. Replace each of these words with one of its synonyms chosen at random. 2. Random Insertion (RI): Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times. 3. Random Swap (RS): Randomly choose two words in the sentence and swap their positions. Do this n times. 4. Random Deletion (RD): Randomly remove each word in the sentence with probability p.

slide-8
SLIDE 8

EDA Examples

slide-9
SLIDE 9

Conserving True Labels ?

slide-10
SLIDE 10

Back-Translation

Chinese English English

slide-11
SLIDE 11

Back-Translation

Chinese English

Model(E->C)

English

Model(E->C)

Chinese Chinese English English Chinese

slide-12
SLIDE 12

Contextual Augmentation

  • Contextual Augmentation: Data Augmentation by

Words with Paradigmatic Relations NAACL18

  • Disadvantages of the Synonym Replacement
  • Snonyms are very limited.
  • Synonym-based augmentation cannot produce

numerous different patterns from the original texts.

slide-13
SLIDE 13

Contextual Augmentation

the actors are fantastic the performances are fantastic the films are fantastic the movies are fantastic the stories are fantastic the performer are fantastic the actress are fantastic

Contextual Augmentation Synonym Replacement

slide-14
SLIDE 14

Contextual Augmentation

Bi-directional LSTM-RNN Pretrained on WikiText-103 corpus Sample

slide-15
SLIDE 15

Contextual Augmentation

the actors are fantastic the actors are good the actors are entertaining the actors are bad the actors are terrible positive positive positive

slide-16
SLIDE 16

Contextual Augmentation

Further trained on each labeled dataset

slide-17
SLIDE 17

Others

  • Variational Auto Encoding (VAE)
  • Paraphrasing
  • Round-trip Translation
  • Generative Adversarial Networks (GAN)
slide-18
SLIDE 18

Methods based on Pre-trained Language Models

  • Conditional BERT Contextual Augmentation ICCS19
  • Do Not Have Enough Data? Deep Learning to the

Rescue! AAAI20

  • Data Augmentation using Pre-trained Transformer

Models Arxiv20

slide-19
SLIDE 19

Methods based on Pre-trained Language Models

From Pre-trained Models for Natural Language Processing: A Survey

slide-20
SLIDE 20

Conditional BERT Contextual Augmentation

ICCS19

Xing Wu, Shangwen Lv, Liangjun Zang, Jizhong Han, Songlin Hu, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China

slide-21
SLIDE 21

BERT

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

slide-22
SLIDE 22

C-BERT

slide-23
SLIDE 23

Do Not Have Enough Data? Deep Learning to the Rescue !

AAAI20

Ateret Anaby-Tavor, Boaz Carmeli, Esther Goldbraich, Amir Kantor, George Kour, Segev Shlomov, Naama Tepper, Naama Zwerdling IBM Research AI, University of Haifa, Israel, Technion - Israel Institute of Technology

slide-24
SLIDE 24

LAMBADA

  • language-model-based data augmentation

(LAMBADA)

  • Disadvantages of the Contextual Augmentation
  • Presumably, methods that make only local

changes will produce sentences with a structure similar to the original ones, thus yielding low corpus-level variability

slide-25
SLIDE 25

GPT

https://gpt2.apps.allenai.org/?text=Joel%20is%20a

slide-26
SLIDE 26

LAMBADA

  • The generative pre-training (GPT) model
slide-27
SLIDE 27

LAMBADA

label sentence label sentence label sentence

slide-28
SLIDE 28

LAMBADA

  • Filter synthesized data

Confidence Score

slide-29
SLIDE 29

Data Augmentation using Pre- trained Transformer Models

Arxiv20 Varun Kumar, Alexa AI Ashutosh Choudhary, Alexa AI Eunah Cho, Alexa AI

slide-30
SLIDE 30

Pre-trained Language Models

From Pre-trained Models for Natural Language Processing: A Survey

slide-31
SLIDE 31

Pre-trained Language Models

BERT GPT-2

slide-32
SLIDE 32

Pre-trained Language Models

  • BART: Denoising Sequence-to-Sequence Pre-training for

Natural Language Generation, Translation, and Comprehension

slide-33
SLIDE 33

Unified Approach

auto-regressive (AR) : GPT2 autoencoder (AE) LM: BERT seq2seq model: BART

slide-34
SLIDE 34

Add Labels : Expend

treats a label as a single token interesting

slide-35
SLIDE 35

Add Labels : Prepend

the model may split label into multiple subword units interesting interest ing

+

fascinating fascinat ing

+

disgusting disgust ing

+

slide-36
SLIDE 36

Fine-tuning

Type PLM Task Labels Model Description AE BERT MLM prepend BERT prepend expand BERT expand

slide-37
SLIDE 37

Fine-tuning

Type PLM Task Labels Model Description AE BERT MLM prepend BERT prepend expand BERT expand AR GPT2 LM (𝑧!𝑇𝐹𝑄𝑦!𝐹𝑃𝑇 …) prepend GPT2 𝑧"𝑇𝐹𝑄 GPT2 context 𝑧"𝑇𝐹𝑄π‘₯!π‘₯#π‘₯$

slide-38
SLIDE 38

Fine-tuning

Type PLM Task Labels Model Description AE BERT MLM prepend BERT prepend expand BERT expand AR GPT2 LM (𝑧!𝑇𝐹𝑄𝑦!𝐹𝑃𝑇 …) prepend GPT2 𝑧"𝑇𝐹𝑄 GPT2 context 𝑧"𝑇𝐹𝑄π‘₯!π‘₯#π‘₯$ Seq2Seq BART Denoising prepend BART word Replace a token with mask BART span Replace a continuous chunk words

slide-39
SLIDE 39

Algorithm

slide-40
SLIDE 40

Experiments

  • Baseline
  • EDA
  • C-BERT
  • Task
  • Sentiment Classification (SST2)
  • Intent Classification (SNIPS)
  • Question Classification (TREC)

five validation examples per class

slide-41
SLIDE 41

Experiments

Extrinsic Evaluation

  • Sentiment Classification
  • Intent Classification
  • Question Classification

Intrinsic Evaluation

  • Semantic Fidelity
  • Text Diversity
slide-42
SLIDE 42

Extrinsic Evaluation

  • Pre-trained BERT classifier
slide-43
SLIDE 43

Semantic Fidelity

  • Training + Test dataset Γ  BERT classifier
slide-44
SLIDE 44

Text Diversity

slide-45
SLIDE 45

Conclusion

  • Data augmentation is useful.
  • EDA, Back-translation,......
  • PLM can be used for data augmentation.
  • Generate new data is powerful than the replace-

based method.

  • Data Augmentation for Text Generation?
slide-46
SLIDE 46

Thanks!