Learning Matching Models with Weak Supervision for Response - PowerPoint PPT Presentation

Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots Wei Wu Yu Wu Microsoft Corporation SKLSDE, Beihang University wuwei@microsoft.com wuyu@buaa.edu.cn Zhoujun Li Ming Zhou SKLSDE, Beihang University Microsoft Research lizj@buaa.edu.cn mingzhou@microsoft.com

Outline • Task, challenges, and ideas • Our approach • A new learning method for matching models. • Experiment • Datasets • Evaluation and analysis

Task: retrieval-based chatbots • Given a message, find most suitable responses • Large repository of message-response pairs • Take it as a search problem Context Responses Feature Retrieval Ranking generation Learning to Context-response Index matching rank

Related Work • Previous works focus on network architectures. • Single Turn • CNN, RNN, syntactic based neural networks …. • Multiple Turn • CNN, RNN, attention mechanism… • These models are data hungry, so they are trained on large scale State-of-the-art multi-turn architecture (Wu et al. ACL 2017) negative sampled dataset.

Background-----Loss Function Cross Entropy Loss (Pointwise loss) Hinge Loss (Pairwise loss) • 𝑀 = − 𝑗 p 𝑗 log( • 𝑇 + − 𝑇 − > 𝜁 𝑞 𝑗 ) • 𝑀 = max(0, 𝑇 − − 𝑇 + + 𝜁)

Background: traditional training method Given a (Q,R) pair, Update the designed Test model on we first randomly model with the use human annotation sampled N instances of point-wise cross data. − 𝑅, 𝑆 𝑗 𝑂 . entropy loss. Two problem: 1. Most of the randomly sampled responses are far from the semantics of the messages or the contexts. 2. Some of randomly sampled responses are false negatives which pollute the training data as noise.

Challenges of Response Selection in Chatbots • Negative sampling oversimplifies response selection task in the training phrase. • Train: Given a utterance, positive responses are collected from human conversations, but negative ones are negative sampled. • Test: Given a utterance, a bunch of responses are returned by a search engine. Human annotators are asked to label these responses. • Human labeling is expensive and exhausting, one cannot have large scale labeled data for model training.

Our Idea Out training process The margin in our loss is dynamic. R Hinge loss R’_1 𝑇(𝑅, 𝑆) − 𝑇(𝑅, 𝑆’ 1 ) + 𝑑 1 R’_2 𝑇(𝑅, 𝑆) − 𝑇(𝑅, 𝑆’ 2 ) + 𝑑 2 Query Optimization Index 𝑇(𝑅, 𝑆) − 𝑇(𝑅, 𝑆’ 3 ) + 𝑑 3 R’_3 … … 𝑇(𝑅, 𝑆) − 𝑇(𝑅, 𝑆’_𝑂 ) + 𝑑_𝑂 R’_N R is the ground-truth response, 𝐷_𝑗 is a confidence score for each and R’_ i is a retrieved instance. instance. Our method encourages the model to be more confident to classify a response with a high 𝑑 𝑗 as a negative one.

How to calculate the dynamic margin? • We employ a Seq2Seq model to compute 𝑑 𝑗 . • Seq2Seq model is a unsupervised model. • It is able to compute a conditional probability likelihood 𝑄 𝑆 𝑅 without human annotation. 𝑡2𝑡 𝑅,𝑆 𝑗 • 𝑑 𝑗 = max(0, 𝑡2𝑡 𝑅,𝑆 − 1)

A new training method Pre-train the Given a (Q,R) pair, Update the matching model retrieve N Test model on designed model with negative instances human annotation with the dynamic − 𝑅, 𝑆 𝑗 𝑂 from a da sampling and cross hinge loss. entropy loss. pre-defined index. 1. Oversimplification problem of the negative sampling The pre-training process approach can be partially mitigated. enables the matching model 2. We can avoid false negative to distinguish semantically far examples and true negative examples are treated away responses. equally during training

Dataset • STC data set (Wang et al., 2013) • Single-turn response selection • Over 4 million post-response pairs (true response) in Weibo for training. • The test set consists of 422 posts with each one associated with around 30 responses labeled by human annotators in “good” and “bad”. • Douban Conversation Corpus (Wu et al., 2017) • Multi-turn response selection • 0.5 million context-response (true response) pairs for training • In the test set, every context has 10 response candidates, and each of the response has a label “good” or “bad” judged by human annotators.

Evaluation Results

Ablation Test • +WSrand: negative samples are randomly generated. • +const: the marginal in the loss function is a static number. • +WS: Our full model

More Findings • Updating the Seq2Seq model is not beneficial to the discriminator. • The number of negative instances is an important hyper- parameter for our model.

Conclusion • We study a less explored problem in retrieval-based chatbots. • We propose of a new method that can leverage unlabeled data to learn matching models for retrieval-based chatbots. • We empirically verify the effectiveness of the method on public data sets.

Learning Matching Models with Weak Supervision for Response - PowerPoint PPT Presentation

Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots Wei Wu Yu Wu Microsoft Corporation SKLSDE, Beihang University wuwei@microsoft.com wuyu@buaa.edu.cn Zhoujun Li Ming Zhou SKLSDE, Beihang

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

Few-shot learning of weak supervision sources in Snorkel (or, learning weakly supervised weak

Weak-Signal Digital Modes Weak-Signal Digital Modes The weak-signal digimodes have been

To the weak I became weak, that I might win the weak. I have become all things to all people,

WEAK INTERPOLATION PROPERTY over THE MINIMAL LOGIC Larisa Maksimova Sobolev Institute of

Linking linking Weak forms Linking Weak forms Elision (sound cut)

The weak-charged WIMP Shigeki Matsumoto (Kavli IPMU) The weak-charged WIMP, Majorana fermion with

Making weak maps compose strictly Richard Garner Uppsala University CT 2008, Calais Outline

Modelling and Verification Lecture 4 Weak bisimilarity and weak bisimulation games Properties of

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

Hypotheses The resilience of organic orchards versus pests can be improved through the

SPRINGFIELD GARDENS/SOUTH JAMAICA TRANSPORTATION STUDY PUBLIC MEETING MAY 13, 2019 STUDY

OUTSTANDING TEACHING, LEARNING AND ASSESSMENT TECHNICAL SKILLS NATIONAL PROGRAMME Master

Aer iam Cane r a Sysuens 2014 HD/SD Cane r a Sysuens EASA Ceruifjed Movntt AIRFILM Unjmpuot

Current Status of LDACS Development Michael Schnell German Aerospace Center (DLR) ICAO

A Hybrid Neural Model for Type Classification of Entity Mentions Li Dong Furu Wei Hong

THE EFFECT OF PLY THICKNESS ON THE DAMAGE MECHANISMS IN NOTCHED COMPOSITES Li Zengshan, Guan

Overall activities in FY 2018 2019 Toshihiko MASUI National Institute for Environmental Studies