Context to Sequence Typical Frameworks and Applications Piji Li - PowerPoint PPT Presentation

Context to Sequence Typical Frameworks and Applications Piji Li Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong FDU-CUHK, 2017 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 1 / 59

Outline Introduction 1 Frameworks 2 Overview Teacher Forcing Adversarial Reinforce Tricks Applications 3 Conclusions 4 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 2 / 59

Introduction Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 3 / 59

Introduction Typical ctx2seq frameworks have obtained significant improvements: Neural machine translation. Abstraction text summarization. Dialog/Conversation system - Chatbot. Caption generation for images and videos. Various strategies to train a better ctx2seq model: Improving teacher forcing. Adversarial training. Reinforcement learning. Tricks (copy, coverage, dual training, etc.). Interesting applications. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 4 / 59

Frameworks Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 5 / 59

Overview Figure 1: Seq2seq framework with attention mechanism and teacher forcing. 1 1 https://github.com/OpenNMT Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 7 / 59

Teacher Forcing Feed the ground-truth sample y t back to the model to be conditioned on for the prediction of later outputs. Advantages : Force the decoder to stay close to the ground-truth sequence. Faster convergence speed. Disadvantage : In prediction: sampling & greedy decoding; beam search. Mismatch between training and testing. Error accumulation during decoding phase. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 9 / 59

Teacher Forcing Improve the Performance Bengio, Samy, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. ” Scheduled sampling for sequence prediction with recurrent neural networks .” NIPS, 2015. [Google Research] Lamb, Alex M., Anirudh Goyal ALIAS PARTH GOYAL, Ying Zhang, Saizheng Zhang, Aaron C. Courville, and Yoshua Bengio. ” Professor forcing: A new algorithm for training recurrent networks .” NIPS, 2016. [University of Montreal] Jang, Eric, Shixiang Gu, and Ben Poole. ” Categorical reparameter- ization with gumbel-softmax .” ICLR, 2017. Gu, Jiatao, Daniel Jiwoong Im, and Victor OK Li. ” Neural Machine Translation with Gumbel-Greedy Decoding .” arXiv (2017). Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 10 / 59

Teacher Forcing Bengio, Samy, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. ” Scheduled sampling for sequence prediction with recurrent neural networks .” NIPS, 2015. [Google Research] Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 11 / 59

Teacher Forcing Scheduled Sampling [1] - Framework Overview of the scheduled sampling method: Figure 2: Illustration of the Scheduled Sampling approach, where one flips a coin at every time step to decide to use the true previous token or one sampled from the model itself.[1] Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 12 / 59

Teacher Forcing Scheduled Sampling [1] - Experiments Image Captioning, MSCOCO: Constituency Parsing, WSJ 22: Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 13 / 59

Teacher Forcing Lamb, Alex M., Anirudh Goyal ALIAS PARTH GOYAL, Ying Zhang, Saizheng Zhang, Aaron C. Courville, and Yoshua Bengio. ” Professor forcing: A new algorithm for training recurrent networks .” NIPS, 2016. [University of Montreal] Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 14 / 59

Teacher Forcing Professor Forcing [3] - Framework Architecture of the Professor Forcing: Figure 3: Match the dynamics of free running with teacher forcing. [3] Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 15 / 59

Teacher Forcing Professor Forcing [3] - Adversarial Training Adversarial training paradigm: Discriminator is Bi-RNN + MLP. D: G: Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 16 / 59

Teacher Forcing Professor Forcing [3] - Experiments Character-Level Language Modeling, Penn-Treebank: Figure 4: Training Negative Log-Likelihood. Training cost decreases faster. Training time is 3 times more. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 17 / 59

Teacher Forcing Jang, Eric, Shixiang Gu, and Ben Poole. ” Categorical reparameter- ization with gumbel-softmax .” ICLR, 2017. Gu, Jiatao, Daniel Jiwoong Im, and Victor OK Li. ” Neural Machine Translation with Gumbel-Greedy Decoding .” arXiv (2017). Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 18 / 59

Teacher Forcing Gumbel Softmax [2] The Gumbel-Max trick (Gumbel, 1954) provides a simple and efficient way to draw samples z from a categorical distribution with class prob- abilities π : Gumbel(0, 1): u ∼ Uniform(0 , 1) and g = − log ( − log ( u )). Gumbel-Softmax is differentiable. Between softmax and one hot. Example: Char-RNN. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 19 / 59

Teacher Forcing Discussions Teacher forcing is good enough. Teacher forcing is indispensable. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 20 / 59

Adversarial Training Generative Adversarial Network (GAN) 2 : 2 Source of figure: https://goo.gl/uPxWTs Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 22 / 59

Adversarial Training Bahdanau, Dzmitry, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. ” An actor- critic algorithm for sequence prediction .” arXiv 2016. (Basic work, connect AC with GAN) Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. ” SeqGAN: Se- quence Generative Adversarial Nets with Policy Gradient .” AAAI 2017. Li, Jiwei, Will Monroe, Tianlin Shi, Alan Ritter, and Dan Jurafsky. ” Adversarial learning for neural dialogue generation .” EMNLP 2017. Wu, Lijun, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. ” Adversarial Neural Machine Translation .” arXiv 2017. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 23 / 59

Adversarial Training SeqGAN [9] Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. ” SeqGAN: Se- quence Generative Adversarial Nets with Policy Gradient .” AAAI 2017. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 24 / 59

Adversarial Training SeqGAN [9] - Framework Overview of the framework: Figure 5: Left: D is trained over the real data and the generated data by G. Right: G is trained by policy gradient where the final reward signal is provided by D and is passed back to the intermediate action value via Monte Carlo search. [9] Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 25 / 59

Adversarial Training SeqGAN [9] - Training Discriminator: CNN (Highway) Policy Gradient: (1) Pre-train the generator and discriminator. (2) Adversarial training. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 26 / 59

Adversarial Training SeqGAN [9] - Experiments Results on three tasks: Policy Gradient: Wang, Jun, et. al. ” IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models .” SIGIR 2017. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 27 / 59

Adversarial Training Adversarial Dialog [4] Li, Jiwei, Will Monroe, Tianlin Shi, Alan Ritter, and Dan Jurafsky. ” Adversarial learning for neural dialogue generation .” EMNLP 2017. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 28 / 59

Adversarial Training Adversarial Dialog [4] - Framework G: seq2seq. D: a hierarchical recurrent encoder. Training: policy gradient. Add teacher forcing back. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 29 / 59

Adversarial Training Adversarial NMT [8] Wu, Lijun, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. ” Adversarial Neural Machine Translation .” arXiv 2017. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 30 / 59

Adversarial Training Adversarial NMT [8] - Framework G: seq2seq. D: CNN Training: policy gradient. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 31 / 59

Adversarial Training Adversarial NMT [8] - Experiments Figure 6: Different NMT systems’ performances on En → Fr translation. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 32 / 59

Adversarial Training Discussions Fine tuning. More robust. Difficult to train. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 33 / 59

Tricks Copy mechanism. Coverage or diversity. Dual or reconstruction. CNN based seq2seq Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 35 / 59

Context to Sequence Typical Frameworks and Applications Piji Li - PowerPoint PPT Presentation

Context to Sequence Typical Frameworks and Applications Piji Li Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong FDU-CUHK, 2017 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 1 / 59

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

SEQ 3 : Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive

Sequence 7 January 2019 OSU CSE 1 Sequence The Sequence component family allows you to

Sequence-to-Sequence Models Can Directly Translate Foreign Speech Ron J. Weiss, Jan Chorowski ,

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence

Scheduling multi-task applications on heterogeneous platforms Anne Benoit, Jean-Fran cois

Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning Battista Biggio * Slides

Improving Reproducible Deep Learning Workflows with DeepDIVA M. Alberti 1 * , V. Pondenkandath 1*

Numerical Simulations of the Wardle Instability Sam Falle, Department of Applied Mathematics,

Deep Learning in Computer Vision (CSC2523) Reading List Bid for papers: Tue, Jan 26, 11.59pm,

Master Recherche IAC Option 2 Robotique et agents autonomes Jamal Atif Mich` ele Sebag LRI

BREAKTHROUGHS IN NEURAL MACHINE TRANSLATION Olof Mogren Chalmers University of Technology

(Question Answering and Dialogue Systems) 063)

Context to Sequence Typical Frameworks and Applications Piji Li - PowerPoint PPT Presentation

Context to Sequence Typical Frameworks and Applications Piji Li Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong FDU-CUHK, 2017 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 1 / 59

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

SEQ 3 : Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive

Sequence 7 January 2019 OSU CSE 1 Sequence The Sequence component family allows you to

Sequence-to-Sequence Models Can Directly Translate Foreign Speech Ron J. Weiss, Jan Chorowski ,

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence

Scheduling multi-task applications on heterogeneous platforms Anne Benoit, Jean-Fran cois

Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning Battista Biggio * Slides

Improving Reproducible Deep Learning Workflows with DeepDIVA M. Alberti 1 * , V. Pondenkandath 1*

Numerical Simulations of the Wardle Instability Sam Falle, Department of Applied Mathematics,

Deep Learning in Computer Vision (CSC2523) Reading List Bid for papers: Tue, Jan 26, 11.59pm,

Master Recherche IAC Option 2 Robotique et agents autonomes Jamal Atif Mich` ele Sebag LRI

BREAKTHROUGHS IN NEURAL MACHINE TRANSLATION Olof Mogren Chalmers University of Technology

(Question Answering and Dialogue Systems) 063)

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or