NLP William Wang Sameer Singh Slides: http://tiny.cc/adversarial - PowerPoint PPT Presentation

Deep Adversarial Learning for NLP William Wang Sameer Singh Slides: http://tiny.cc/adversarial With contributions from Jiwei Li. 1

Agenda • Introduction, Background, and GANs (William, 90 mins) • Adversarial Examples and Rules (Sameer, 75 mins) • Conclusion and Question Answering (Sameer and William, 15 mins) Slides: http://tiny.cc/adversarial 2

Outline • Background of the Tutorial • Introduction: Adversarial Learning in NLP • Adversarial Generation • A Case Study of GANs in Dialogue Systems 3

Rise of Adversarial Learning in NLP • Through a simple ACL anthology search, we found that in 2018, there were 20+ times more papers mentioning “adversarial”, comparing to 2016. • Meanwhile, the growth of all accepted papers is 1.39 times during this period. • But if you went to CVPR 2018 in Salt Lake City, there were more than 100 papers on adversarial learning (approximately 1/3 of all adv. learning papers in NLP). 4

Questions I’d like to Discuss • What are the subareas of deep adversarial learning in NLP? • How do we understand adversarial learning? • What are some success stories? • What are the pitfalls that we need to avoid? 5

Opportunities in Adversarial Learning • Adversarial learning is an interdisciplinary research area, and it is closely related to, but limited to the following fields of study: • Machine Learning • Computer Vision • Natural Language Processing • Computer Security • Game Theory • Economics 6

Adversarial Attack in ML, Vision, & Security • Goodfellow et al., (2015) 7

Physical-World Adversarial Attack / Examples (Eykholt et al., CVPR 2018) 8

Success of Adversarial Learning CycleGAN (Zhu et al., 2017) 9

Failure Cases CycleGAN (Zhu et al., 2017) 10

Success of Adversarial Learning GauGAN (Park et al., 2019) 11

Deep Adversarial Learning in NLP • There were some successes of GANs in NLP, but not so much comparing to Vision. • The scope of Deep Adversarial Learning in NLP includes: • Adversarial Examples, Attacks, and Rules • Adversarial Training (w. Noise) • Adversarial Generation • Various other usages in ranking, denoising, & domain adaptation. 12

Adversarial Examples • One of the more popular areas of adversarial learning in NLP. • E.g., Alzantot et al., EMNLP 2018 14

Adversarial Attacks (Coavoux et al., EMNLP 2018) The main classifier predicts a label y from a text x, the attacker tries to recover some private information z contained in x from the latent representation used by the main classifier. 15

Adversarial Training • Main idea: • Adding noise, randomness, or adversarial loss in optimization. • Goal: make the trained model more robust. 16

Adversarial Training: A Simple Example • Adversarial Training for Relation Extraction • Wu, Bamman, Russell (EMNLP 2017). • Task: Relation Classification. • Interpretation: Regularization in the Feature Space. 17

Adversarial Training for Relation Extraction Wu, Bamman, Russell (EMNLP 2017). 18

Adversarial Training for Relation Extraction Wu, Bamman, Russell (EMNLP 2017). 19

GANs (Goodfellow et al., 2014) • Two competing neural networks: generator & discriminator forger trying to produce some counterfeit material the classifier trying to detect the fake sample Image: https://ishmaelbelghazi.github.io/ALI/ 21

GAN Objective D ( x ) : the probability that x came from the data rather than generator D G 22 Goodfellow , et al., “Generative adversarial networks,” in NIPS , 2014.

GAN Training Algorithm Discriminator Generator 23 Goodfellow , et al., “Generative adversarial networks,” in NIPS , 2014.

GAN Equilibrium • Global optimality • Discriminator D G • Generator s.t. 24 Goodfellow , et al., “Generative adversarial networks,” in NIPS , 2014.

Major Issues of GANs • Mode Collapse (unable to produce diverse samples) 25

Major Issues of GANs in NLP • Often you need to pre-train the generator and discriminator w. MLE • But how much? • Unstable Adversarial Training • We are dealing with two networks / learners / agents • Should we update them at the same rate? • The discriminator might overpower the generator. • With many possible combinations of model choice for generator and discriminator networks in NLP, it could be worse. 26

Major Issues of GANs in NLP • GANs were originally designed for images • You cannot back-propagate through the generated X • Image is continuous, but text is discrete (DR-GAN, Tran et al., CVPR 2017). 27

SeqGAN: policy gradient for generating sequences (Yu et al., 2017) 28

Training Language GANs from Scratch • New Google DeepMind arxiv paper (de Masson d’Autume et al., 2019) • Claims no MLE pre-trainings are needed. • Uses per time-stamp dense rewards. • Yet to be peer-reviewed and tested. 29

Why shouldn’t NLP give up on GAN? • It’s unsupervised learning. • Many potential applications of GANs in NLP. • The discriminator is often learning a metric. • It can also be interpreted as self-supervised learning (especially with dense rewards). 30

Applications of Adversarial Learning in NLP • Social Media (Wang et al., 2018a; Carton et al., 2018) • Contrastive Estimation (Cai and Wang, 2018; Bose et al., 2018) • Domain Adaptation (Kim et al., 2017; Alam et al., 2018; Zou et al., 2018; Chen and Cardie, 2018; Tran and Nguyen, 2018; Cao et al., 2018; Li et al., 2018b) • Data Cleaning (Elazar and Goldberg, 2018; Shah et al., 2018; Ryu et al., 2018; Zellers et al., 2018) • Information extraction (Qin et al., 2018; Hong et al., 2018; Wang et al., 2018b; Shi et al., 2018a; Bekoulis et al., 2018) • Information retrieval (Li and Cheng, 2018) • Another 18 papers on Adversarial Learning at NAACL 2019! 31

GANs for Machine Translation • Yang et al., NAACL 2018 • Wu et al., ACML 2018 32

SentiGAN (Wang and Wan, IJCAI 2018) Idea: use a mixture of generators and a multi-class discriminator. 33

No Metrics Are Perfect: Adversarial Reward Learning (Wang, Chen et al., ACL 2018) 34

AREL Storytelling Evaluation • Dataset: VIST (Huang et al., 2016). Turing Test 50% -6.3 40% -13.7 -17.5 -26.1 30% 20% 10% 0% XE BLEU-RL CIDEr-RL GAN AREL Win Unsure 35

DSGAN: Adversarial Learning for Distant Supervision IE (Qin et al., ACL 2018) 36

DSGAN: Adversarial Learning for Distant Supervision IE (Qin et al., ACL 2018) 37

KBGAN: Learning to Generate High-Quality Negative Examples (Cai and Wang, NAACL 2018) Idea: use adversarial learning to iteratively learn better negative examples. 38

Outline • Background of the Tutorial • Introduction: Adversarial Learning in NLP • Understanding Adversarial Learning • Adversarial Generation • A Case Study of GANs in Dialogue Systems 39

What Should Rewards for Good Dialogue Be Like ? 40

Reward for Good Dialogue Turing Test 41

Jl3 Reward for Good Dialogue I’m 25. How old are you ? I don’t know what you are talking about A human evaluator/ judge 42

Jl3 Reward for Good Dialogue I’m 25. How old are you ? I don’t know what you are talking about 43

Jl3 Reward for Good Dialogue P= 90% human generated I’m 25. How old are you ? I don’t know what you are talking about P= 10% human generated 44

Jl3 Jl4 Adversarial Learning in Image Generation ( Goodfellow et al., 2014) 45

Model Breakdown . fine EOS I’m Generative Model (G) Encoding Decoding how eos are you ? I’m fine . 46

Model Breakdown . fine EOS I’m Generative Model (G) Encoding Decoding how eos are you ? I’m fine . Discriminative Model (D) P= 90% human generated how are you ? 47 eos I’m fine .

Model Breakdown . fine EOS I’m Generative Model (G) Encoding Decoding how eos are you ? I’m fine . Discriminative Model (D) Reward P= 90% human generated how are you ? 48 eos I’m fine .

Policy Gradient Generative Model (G) fine EOS I’m Encoding Decoding how eos are you ? I’m fine . REINFORCE Algorithm (William,1992) 49

Adversarial Learning for Neural Dialogue Generation Update the Discriminator Update the Generator The discriminator forces the generator to produce correct responses 50

Human Evaluation The previous RL model only perform better on multi-turn conversations 51

Results: Adversarial Learning Improves Response Generation vs a vanilla generation model Adversarial Adversarial Tie Win Lose 62% 18% 20% Human Evaluator 52

Sample response Tell me ... how long have you had this falling sickness ? System Response 53

Sample response Tell me ... how long have you had this falling sickness ? System Response Vanilla-Seq2Seq I don’t know what you are talking about. 54

NLP William Wang Sameer Singh Slides: http://tiny.cc/adversarial - PowerPoint PPT Presentation

Deep Adversarial Learning for NLP William Wang Sameer Singh Slides: http://tiny.cc/adversarial With contributions from Jiwei Li. 1 Agenda Introduction, Background, and GANs (William, 90 mins) Adversarial Examples and Rules (Sameer, 75

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Introduction to NLP and NLG Introduction to NLP Rules or Statistics?? Lexical Analysis,

Neuro Linguistic Programming (NLP) and Health Promoting Schools By Monique Veza HPS Advisor

Varying Nf in QCD: scale separation, topology (and hot axions) Maria Paola Lombardo INFN I.

neutron star mergers Kenta Kiuchi (YITP) Masaru Shibata (YITP), Yuichiro Sekiguchi (Toho Univ.),

Exploiting Cross-Sentence Context for Neural Machine Translation Longyue Wang Zhaopeng Tu

Simulations of the inspiral and merger of neutron star binaries Jos A. Font Departamento de

Theres Always a First Time A Clinical Problem Solving Case Gurpreet Dhaliwal, MD Professor of

Ask an Expert: Ansible Network Automation Sean Cavanaugh Iftikhar Khan Technical Marketing

A new RMF based quark-nuclear matter EoS for applications in astrophysics and heavy-ion collisions

quark star model Enping Zhou Supervisor: Prof. Renxin Xu & Prof. Luciano Rezzolla 2015.01.13

NLP William Wang Sameer Singh Slides: http://tiny.cc/adversarial - PowerPoint PPT Presentation

Deep Adversarial Learning for NLP William Wang Sameer Singh Slides: http://tiny.cc/adversarial With contributions from Jiwei Li. 1 Agenda Introduction, Background, and GANs (William, 90 mins) Adversarial Examples and Rules (Sameer, 75

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Introduction to NLP and NLG Introduction to NLP Rules or Statistics?? Lexical Analysis,

Neuro Linguistic Programming (NLP) and Health Promoting Schools By Monique Veza HPS Advisor

Varying Nf in QCD: scale separation, topology (and hot axions) Maria Paola Lombardo INFN I.

neutron star mergers Kenta Kiuchi (YITP) Masaru Shibata (YITP), Yuichiro Sekiguchi (Toho Univ.),

Exploiting Cross-Sentence Context for Neural Machine Translation Longyue Wang Zhaopeng Tu

Simulations of the inspiral and merger of neutron star binaries Jos A. Font Departamento de

Theres Always a First Time A Clinical Problem Solving Case Gurpreet Dhaliwal, MD Professor of

Ask an Expert: Ansible Network Automation Sean Cavanaugh Iftikhar Khan Technical Marketing

A new RMF based quark-nuclear matter EoS for applications in astrophysics and heavy-ion collisions

quark star model Enping Zhou Supervisor: Prof. Renxin Xu &amp; Prof. Luciano Rezzolla 2015.01.13

quark star model Enping Zhou Supervisor: Prof. Renxin Xu & Prof. Luciano Rezzolla 2015.01.13