Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad - PowerPoint PPT Presentation

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad Adaptive N e Nets C arnegie M ellon U niversity Zhilin Yang , Junjie Hu, Ruslan Salakhutdinov, William W. Cohen Xiachong Feng

Ou Outline • Author • Overview • Semi-Supervised QA • Discriminative Model • Domain Adaptation with Tags • Generative Model • Objective function • Training Algorithm • Experiment • Conclusion

Au Auth thor 杨植麟（ Zhilin Yang ） Third-year PhD student • Language Technologies Institute • School of Computer Science • Carnegie Mellon University • Prior to coming to CMU, worked • with Jie Tang at Tsinghua University

Overview Ov Task ： Semi-supervised question answering • Use unlabeled data Model ： • 1. Use linguistic tags to extract Discriminative Model possible answer Generative （ For QA ） 2. Train a generative model to Domain generate questions Adaptive Generative Model 3. Train a discriminative model Nets （ For QG ） based on both data Problem ： Discrepancy between the model-generated data distribution • and the human-generated data distribution Method ： Domain adaptation algorithms, based on reinforcement • learning （ Two domain adaptation techniques ） Domain tag （ For D ）： model-generated or human-generated • Reinforcement learning （ For G ）： minimize the loss of the • discriminative model in an adversarial way

Se Semi mi-Su Supervised QA QA 1. Dataset ： 2. Extractive question answering ： where a is always a consecutive chunk of text in p . 3. Unlabeled Dataset ： 4. Question answering mode D • Discriminative model • Data: the labeled data L and the unlabeled data U • Goal ：

Dis Discr crim imin inativ ive M Model • Goal ： Learns the Conditional probability of an answer (a) chunk given the paragraph (p) and the question (q) • Base Model: Gated-attention (GA) reader

Do Doma main Ad Adaptati tion with th Tags gs • Problem: Learning from both human-generated data and model- generated data can thus lead to a biased model . • Method: Model-generated d_gen data distribution Domain Adaptation Human-generated d_true data distribution Answer Answer By introducing the domain tags, we expect the discriminative model D D to factor out domain- specific and domain- Question Paragraph d_gen Question Paragraph d_true invariant representations. Labeled data Unlabeled data

Ge Generativ tive Model Goal: Learns the Conditional probability of generating a question(q) given • the paragraph(p) and the answer(a) Base Model: • sequence-to-sequence model with copy and attention mechanism • Encoder: • Encodes the input paragraph into a sequence of hidden states H • Inject the answer information by appending an additional zero/one feature • to the word embeddings of the paragraph tokens Decoder: • probability of generating the probability of copying a token from the vocabulary token from the paragraph

Object ctive funct ction • D ： Relies on the data generated by the generative mode • G ： Aims to match the model-generated data distribution with the human-generated data distribution using the signals from the discriminative model. • D objective function （ conditioning on domain tags ） • Final D objective function ：

Object ctive funct ction • For G, What will happen if we maxing ? • G aims to generate questions that can be reconstructed by the D Answer Reconstruction loss D Answer Paragraph d_gen Question G Unlabeled data • Generated question maybe the same as the answer!!! • Similar to Auto-encoder • Method : adversarial training objective

Tr Training Algorithm random init Pre-train on L

Tr Training Algorithm Reinforcement Learning • Action space ： all possible questions with length T （ maybe padding ） • Reward ： non-differentiable • Gradient ：

Ex Experiment -Answer Extract ction Assumes: answers are available for unlabeled data • Answers in the SQuAD dataset can be categorized into ten types , • i.e., “Date”, “Other Numeric”, “Person”, “Location”, “Other Entity”, “Common Noun Phrase”, “Adjective Phrase”, “Verb Phrase”, “Clause” and “Other” Part-Of-Speech (POS) tagger: label each word • Constituency parser : noun phrase, verb phrase, adjective and clause • Named Entity Recognizer (NER) ： assign each word with one of the • seven labels, “Date”, “Money”, “Percent”, “location”, “Organization” and “Time”. Subsample five answers from all the extracted answers for each • paragraph according to the percentage of answer types in the SQuAD dataset.

Ex Experiment - Ba Basel eline e mo model el Given • Given • Q: • W: window size •

Ex Expe perime ment- Com Comparison on M Method ods Methods • Method Model Description supervised learning setting, train the model D SL on the labeled data L D Context simple context-based method(baseline model) Context + domain Context method with domain tags Answer Answer Answer D D D d_true Paragraph Paragraph Question Question Paragraph Question d_gen Context Context + Domain SL Labeled + Unlabeled data Labeled + Unlabeled data Labeled data

Ex Expe perime ment- Com Comparison on M Method ods Methods • Method Model Description train a generative model and use the generated Gen questions as additional training data (copy+attn) Gen + GAN Reinforce Gen + dual D+G Dual learning method Gen with domain tags , while the generative Gen + domain model is trained with MLE and fixed . Gen + domain + adv Adversarial(adv) training based on Reinforce fixed Gen + domain Gen + domain + adv Gen + GAN Gen + dual

Re Results and Analysis Labeling rates • percentage of training instances that are used to train D • Unlabeled dataset sizes: • sample a subset of around 50,000 instances • Metric • F1 score • Exact matching (EM) scores •

Re Results and Analysis SL v.s. SSL • use only 0.1 training instances to obtain even better performance • than a supervised learning approach with 0.2 training instances Ablation Study • both the domain tags and the adversarial training contribute to the • performance of the GDANs

Re Results and Analysis Unlabeled Data Size • the performance can be further improved when a larger unlabeled • dataset is used

Re Results and Analysis Context-Based Method • the simple context-based method, though performing worse than • GDANs, still leads to substantial gains MLE vs RL • the simple context-based method, though performing worse than • GDANs, still leads to substantial gains

Re Results and Analysis Samples of Generated Questions • RL-generated questions are more informative • RL-generated questions are more accurate •

Concl clusion • Task : Semi-supervised question answering • Model : Generative Domain-Adaptive Nets • Simple Baseline method : Context • Experiment

Thank Thank yo you!

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad - PowerPoint PPT Presentation

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad Adaptive N e Nets C arnegie M ellon U niversity Zhilin Yang , Junjie Hu, Ruslan Salakhutdinov, William W. Cohen Xiachong Feng Ou Outline Author Overview

E1-24a King Den at his Sed Festival ( heb-sed ) E1-40 Djoser (Zoser) at his Sed Festival

STAT 605 Data Science Computing Introduction to sed and awk Editing text streams: sed sed is short

CIS 218 stream editor (sed) CIS 218 Advanced UNIX 1 sed Uses same syntax as vi Batch

generative design systems Generative Brief Design Definitions Workshop Processes

We Weakly-supe supervise sed d Vid Video eo Rec ecogn gnitio ition Pa Pascal Mettes

Contributions to Analysis and Functional Analysis in memoriam Pawe l Doma nski Dietmar

Convolution operators in discrete Ces` aro spaces Werner Ricker Pawe Doma nski Memorial

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Presentation Outline SED/ED in the REI4P Local Community Ownership Issues for

It iz tiem too gow hoam sed v kator pilla. But iy doat wont 2 gow howm sed th butt or flie. Iy

The ARCHES SED archive of 3XMM sources by Mauro Lpez (mauro@cab.inta- csic.es) Spanish

Session 3: Vim P . S. Langeslag 1 November 2018 sed Replacement Operation Effect Replace all

Semi-Crystalline Polymer Morphologies and their Hierarchical Morphologies 1 Semi-Crystalline

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Learning Answer Set Programming Rules for Ethical Machines Abeer Dyoub 1 Stefania Costantini 1

The future of surveys for official statistics Jelke Bethlehem Statistics Netherlands

Open-Ended Questions GESIS Survey Guidelines Cornelia Zll These slides are based on the GESIS

Answer and Alert Modes Dean Willis, Andrew Allen SIP , IETF 64 Changes Complete rewrite of

Research with Graduate Students & other Collaborators Barry L. Nelson Dept. of Ind. Engr.

SYMMETRIC ENCRYPTION Mihir Bellare UCSD 1 Syntax A symmetric encryption scheme SE = ( K , E , D

Revisiting Question Answering in Vampire Giles Reger School of Computer Science, University of

The Importance of Interaction in Information Retrieval Bruce Croft SIGIR 2019 UMass Amherst

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad - PowerPoint PPT Presentation

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad Adaptive N e Nets C arnegie M ellon U niversity Zhilin Yang , Junjie Hu, Ruslan Salakhutdinov, William W. Cohen Xiachong Feng Ou Outline Author Overview

E1-24a King Den at his Sed Festival ( heb-sed ) E1-40 Djoser (Zoser) at his Sed Festival

STAT 605 Data Science Computing Introduction to sed and awk Editing text streams: sed sed is short

CIS 218 stream editor (sed) CIS 218 Advanced UNIX 1 sed Uses same syntax as vi Batch

generative design systems Generative Brief Design Definitions Workshop Processes

We Weakly-supe supervise sed d Vid Video eo Rec ecogn gnitio ition Pa Pascal Mettes

Contributions to Analysis and Functional Analysis in memoriam Pawe l Doma nski Dietmar

Convolution operators in discrete Ces` aro spaces Werner Ricker Pawe Doma nski Memorial

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Presentation Outline SED/ED in the REI4P Local Community Ownership Issues for

It iz tiem too gow hoam sed v kator pilla. But iy doat wont 2 gow howm sed th butt or flie. Iy

The ARCHES SED archive of 3XMM sources by Mauro Lpez (mauro@cab.inta- csic.es) Spanish

Session 3: Vim P . S. Langeslag 1 November 2018 sed Replacement Operation Effect Replace all

Semi-Crystalline Polymer Morphologies and their Hierarchical Morphologies 1 Semi-Crystalline

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Learning Answer Set Programming Rules for Ethical Machines Abeer Dyoub 1 Stefania Costantini 1

The future of surveys for official statistics Jelke Bethlehem Statistics Netherlands

Open-Ended Questions GESIS Survey Guidelines Cornelia Zll These slides are based on the GESIS

Answer and Alert Modes Dean Willis, Andrew Allen SIP , IETF 64 Changes Complete rewrite of

Research with Graduate Students &amp; other Collaborators Barry L. Nelson Dept. of Ind. Engr.

SYMMETRIC ENCRYPTION Mihir Bellare UCSD 1 Syntax A symmetric encryption scheme SE = ( K , E , D

Revisiting Question Answering in Vampire Giles Reger School of Computer Science, University of

The Importance of Interaction in Information Retrieval Bruce Croft SIGIR 2019 UMass Amherst

Research with Graduate Students & other Collaborators Barry L. Nelson Dept. of Ind. Engr.