Learning Matching Models with Weak Supervision for Response - - PowerPoint PPT Presentation

β–Ά
learning matching models with weak
SMART_READER_LITE
LIVE PREVIEW

Learning Matching Models with Weak Supervision for Response - - PowerPoint PPT Presentation

Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots Wei Wu Yu Wu Microsoft Corporation SKLSDE, Beihang University wuwei@microsoft.com wuyu@buaa.edu.cn Zhoujun Li Ming Zhou SKLSDE, Beihang


slide-1
SLIDE 1

Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots

Yu Wu SKLSDE, Beihang University wuyu@buaa.edu.cn Wei Wu Microsoft Corporation wuwei@microsoft.com Zhoujun Li SKLSDE, Beihang University lizj@buaa.edu.cn Ming Zhou Microsoft Research mingzhou@microsoft.com

slide-2
SLIDE 2

Outline

  • Task, challenges, and ideas
  • Our approach
  • A new learning method for matching models.
  • Experiment
  • Datasets
  • Evaluation and analysis
slide-3
SLIDE 3

Task: retrieval-based chatbots

  • Given a message, find most suitable responses
  • Large repository of message-response pairs
  • Take it as a search problem

Retrieval Feature generation Ranking

Context-response matching

Learning to rank Index Context Responses

slide-4
SLIDE 4

Related Work

  • Previous works focus on network

architectures.

  • Single Turn
  • CNN, RNN, syntactic based neural

networks ….

  • Multiple Turn
  • CNN, RNN, attention mechanism…
  • These models are data hungry,

so they are trained on large scale negative sampled dataset.

State-of-the-art multi-turn architecture (Wu et al. ACL 2017)

slide-5
SLIDE 5

Background-----Loss Function

Cross Entropy Loss (Pointwise loss) Hinge Loss (Pairwise loss)

  • 𝑀 = βˆ’ 𝑗 p𝑗log(

π‘žπ‘—)

  • 𝑇 + βˆ’ 𝑇 βˆ’ > 𝜁
  • 𝑀 = max(0, 𝑇 βˆ’ βˆ’ 𝑇 + + 𝜁)
slide-6
SLIDE 6

Background: traditional training method

Given a (Q,R) pair, we first randomly sampled N instances 𝑅, 𝑆𝑗

βˆ’ 𝑂.

Update the designed model with the use

  • f point-wise cross

entropy loss. Test model on human annotation data.

Two problem:

  • 1. Most of the randomly sampled responses are far from the semantics
  • f the messages or the contexts.
  • 2. Some of randomly sampled responses are false negatives which

pollute the training data as noise.

slide-7
SLIDE 7

Challenges of Response Selection in Chatbots

  • Negative sampling oversimplifies response selection task in the

training phrase.

  • Train: Given a utterance, positive responses are collected from human

conversations, but negative ones are negative sampled.

  • Test: Given a utterance, a bunch of responses are returned by a search
  • engine. Human annotators are asked to label these responses.
  • Human labeling is expensive and exhausting, one cannot have large

scale labeled data for model training.

slide-8
SLIDE 8

Outline

  • Task, challenges, and ideas
  • Our approach
  • A new learning method for matching models.
  • Experiment
  • Datasets
  • Evaluation and analysis
slide-9
SLIDE 9

Our Idea

Index Out training process Query R R’_1 R’_2 R’_3 … R’_N

R is the ground-truth response, and R’_i is a retrieved instance.

Hinge loss

𝑇(𝑅, 𝑆) βˆ’ 𝑇(𝑅, 𝑆’1 ) + 𝑑1 𝑇(𝑅, 𝑆) βˆ’ 𝑇(𝑅, 𝑆’2) + 𝑑2 𝑇(𝑅, 𝑆) βˆ’ 𝑇(𝑅, 𝑆’3 ) + 𝑑3

…

𝑇(𝑅, 𝑆) βˆ’ 𝑇(𝑅, 𝑆’_𝑂 ) + 𝑑_𝑂

Optimization

𝐷_𝑗 is a confidence score for each instance. Our method encourages the model to be more confident to classify a response with a high 𝑑𝑗 as a negative one.

The margin in our loss is dynamic.

slide-10
SLIDE 10

How to calculate the dynamic margin?

  • We employ a Seq2Seq model to

compute 𝑑𝑗.

  • Seq2Seq model is a unsupervised

model.

  • It is able to compute a conditional

probability likelihood 𝑄 𝑆 𝑅 without human annotation.

  • 𝑑𝑗 = max(0,

𝑑2𝑑 𝑅,𝑆𝑗 𝑑2𝑑 𝑅,𝑆 βˆ’ 1)

slide-11
SLIDE 11

A new training method

Pre-train the matching model with negative sampling and cross entropy loss. Given a (Q,R) pair, retrieve N instances 𝑅, 𝑆𝑗

βˆ’ 𝑂 from a

pre-defined index. Update the designed model with the dynamic hinge loss. Test model on human annotation da

The pre-training process enables the matching model to distinguish semantically far away responses.

  • 1. Oversimplification problem of the negative sampling

approach can be partially mitigated.

  • 2. We can avoid false negative

examples and true negative examples are treated equally during training

slide-12
SLIDE 12

Outline

  • Task, challenges, and ideas
  • Our approach
  • A new learning method for matching models.
  • Experiment
  • Datasets
  • Evaluation and analysis
slide-13
SLIDE 13

Dataset

  • STC data set (Wang et al., 2013)
  • Single-turn response selection
  • Over 4 million post-response pairs (true response) in Weibo for training.
  • The test set consists of 422 posts with each one associated with around 30

responses labeled by human annotators in β€œgood” and β€œbad”.

  • Douban Conversation Corpus (Wu et al., 2017)
  • Multi-turn response selection
  • 0.5 million context-response (true response) pairs for training
  • In the test set, every context has 10 response candidates, and each of the

response has a label β€œgood” or β€œbad” judged by human annotators.

slide-14
SLIDE 14

Evaluation Results

slide-15
SLIDE 15

Ablation Test

  • +WSrand: negative samples are

randomly generated.

  • +const: the marginal in the loss

function is a static number.

  • +WS: Our full model
slide-16
SLIDE 16

More Findings

  • Updating the Seq2Seq model is

not beneficial to the discriminator.

  • The number of negative

instances is an important hyper- parameter for our model.

slide-17
SLIDE 17

Conclusion

  • We study a less explored problem in retrieval-based chatbots.
  • We propose of a new method that can leverage unlabeled data to learn

matching models for retrieval-based chatbots.

  • We empirically verify the effectiveness of the method on public data sets.