Dialogue Quality and Nugget Detection for Short Text Conversation - PowerPoint PPT Presentation

WIDM @ NTCIR-14 STC-3 Task: Dialogue Quality and Nugget Detection for Short Text Conversation (STC-3) based on Hierarchical Multi-Stack Model with Memory Enhance Structure NATIONAL CENTRAL UNIVERSITY, TAOYUAN, TAIWAN AUTHORS : HSIANG-EN CHERNG AND CHIA-HUI CHANG PRESENTER : HSIANG-EN CHERNG (SEAN) 2019/6/27 1

Outline 1. Introduction 2. Dialogue Quality (DQ) Subtask 3. Nugget Detection (ND) Subtask 4. Conclusion 2019/6/27 2

Introduction Task Overview – DQ Subtask Task Overview – ND Subtask Contribution 2019/6/27 3

Task Overview – DQ Subtask Goal of DQ ◦ DQ aims to evaluate the quality of a dialogue by three measures (scale: -2, -1, 0, 1, 2) 1) A-score: Task Accomplishment 2) E-score: Dialogue Effectiveness 3) S-score: Customer Satisfaction of the dialogue Why DQ ◦ To build good task-oriented dialogue systems, we need good ways to evaluate them ◦ You cannot improve dialogue systems if you cannot measure, DQ provides 3 measures 2019/6/27 4

Task Overview – ND Subtask Goal of ND ◦ ND subtask aims to classify the nugget of utterances in a dialogue ◦ ND is similar to dialogue act (DA) labeling problem Nugget: purpose or motivation Why ND ◦ Nuggets may serve as useful features for automatically estimating dialogue quality ◦ ND may help us diagnose a dialogue closely (why it failed, where it failed) ◦ Experiences from ND may help us design effectively and efficiently helpdesk systems 2019/6/27 5

Contribution 1. We proposed and compared several DNN models based on ◦ Hierarchical multi-stack CNN for sentence and dialog representation ◦ BERT for sentence representation 2. We compared the models with or without memory enhance 3. We compared simple BERT model with BERT + complex structures model 4. In both DQ and ND, our models result in the best performance comparing with organizer baseline models BERT: An pre-train model based on multiple bi-directional transformer blocks (Devlin, J., Chang, M, W., Lee, K., Toutanova, K. 2018) 2019/6/27 6

Dialogue Quality (DQ) Subtask Model Experiments 2019/6/27 7

Memory enhanced multi-stack gated CNN (MeHGCNN) Embedding layer ◦ 100 dimensions Word2Vec Utterance layer ◦ 2-stack gated CNN learning sentence representation Context layer ◦ 1-stack gated CNN learning context information Memory layer (Memory Network) ◦ Further capture long-range context features Output layer ◦ Output DQ distribution by softmax 2019/6/27 8

3 techniques we used in our models 1. Multi-stack structure 2. Gating mechanism 3. Memory enhance (memory network) 2019/6/27 9

Multi-stack Multi-stack structure ◦ Hierarchically capture rich n-gram information ◦ Window size k and # stacks m can capture m(k-1)+1 words features 2019/6/27 10

Gating mechanism & Memory Enhance Structure Gating mechanism ◦ Widely used in LSTM and GRU to control the gates of memory states ◦ The idea of gated CNN is to learn whether to keep or drop a feature generated by CNN ◦ Language modeling with gated convolutional networks (Dauphin, Y, N., Fan, A., Auli, M. 2016) Memory enhance structure ◦ LSTM are not good at capturing very long-range context features ◦ Memory network is applied to our models to get detail context features by self-attention ◦ Memory networks (Weston, J., Chopra, S., Bordes, A. 2015) 2019/6/27 11

Utterance Layer: 2-stack Gated CNN  Utterance layer (UL) ◦ 𝑚 = 1 𝑚 = 𝑥 𝑗,1 , 𝑥 𝑗,2 , … , 𝑥 𝑗,𝑜 ◦ X 𝑗 𝑚 = 𝐷𝑝𝑜𝑤𝐵 X 𝑗 𝑚 ◦ 𝑣𝑚A 𝑗 𝑚 = 𝐷𝑝𝑜𝑤𝐶 X 𝑗 𝑚 ◦ 𝑣𝑚B 𝑗 𝑗𝑔 𝑚 ≤ 2 𝑚 = 𝑣𝑚A 𝑗 𝑚 ⊙ 𝜏 𝑣𝑚B 𝑗 𝑚 ◦ 𝑣𝑚C 𝑗 𝑚←𝑚+1 = 𝑣𝑚C 𝑗 𝑚 ◦ X 𝑗 1x1 1x7 𝑚 , 𝑡𝑞𝑓𝑏𝑙𝑓𝑠 ◦ 𝑣𝑚 𝑗 = 𝑛𝑏𝑦𝑞𝑝𝑝𝑚 𝑣𝑚C 𝑗 𝑗 , 𝑜𝑣𝑕𝑕𝑓𝑢 𝑗 Apply max-pooling to the output of the last stack 2019/6/27 12

Context Layer: 1-stack Gated CNN  Context layer (CL) ◦ Conduct the same operations as UL but no additional features ◦ 𝑑𝑚𝐵 𝑗 = 𝐷𝑝𝑜𝑤𝐵 𝑣𝑚 𝑗−1 , 𝑣𝑚 𝑗 , 𝑣𝑚 𝑗+1 ◦ 𝑑𝑚B 𝑗 = 𝐷𝑝𝑜𝑤𝐶 𝑣𝑚 𝑗−1 , 𝑣𝑚 𝑗 , 𝑣𝑚 𝑗+1 ◦ 𝑑𝑚C 𝑗 = 𝑑𝑚𝐵 𝑗 ⊙ 𝜏 𝑑𝑚B 𝑗 ◦ 𝑑𝑚 𝑗 = 𝑛𝑏𝑦𝑞𝑝𝑝𝑚 𝑑𝑚C 𝑗 The output of context layer for utterance i is 𝒅𝒎 𝒋 2019/6/27 13

Memory Layer  Memory layer (ML) Both input memory ( 𝐽 𝑗 ) and output memory ( 𝑃 𝑗 ) are generated by BI-GRU from 𝑑𝑚 𝑗 1) ◦ Input Memory ◦ 𝐽 𝑗 = 𝐻𝑆𝑉 𝑑𝑚 𝑗 , ℎ 𝑗−1 1) ◦ 𝐽 𝑗 = 𝐻𝑆𝑉 𝑑𝑚 𝑗 , ℎ 𝑗+1 ◦ 𝐽 𝑗 = 𝑢𝑏𝑜ℎ 𝐽 𝑗 + 𝐽 𝑗 ◦ Output Memory ◦ 𝑃 𝑗 = 𝐻𝑆𝑉 𝑑𝑚 𝑗 , ℎ 𝑗−1 ◦ 𝑃 𝑗 = 𝐻𝑆𝑉 𝑑𝑚 𝑗 , ℎ 𝑗+1 1) ◦ 𝑃 𝑗 = 𝑢𝑏𝑜ℎ 𝑃 𝑗 + 𝑃 𝑗 2019/6/27 14

Memory Layer (cont.)  Memory layer (ML) Attention weight is the inner product between 𝑑𝑚 𝑗 2) and 𝐽 𝑗 followed by softmax 𝑓𝑦𝑞 𝑑𝑚 𝑗 ∙𝐽 𝑗 ◦ 𝑥 𝑗 = 𝑙 σ 𝑗′=1 𝑓𝑦𝑞 𝑑𝑚 𝑗′ ∙𝐽 𝑗′ 3) 3) The output of memory layer for 𝑑𝑚 𝑗 is the addition between weighted sum of 𝑷 𝒋 and 𝒅𝒎 𝒋 𝑙 2) ◦ 𝑛𝑚 𝑗 = σ 𝑗 ′ =1 𝑥 𝑗 ′ ∙ 𝑃 𝑗 ′ + 𝑑𝑚 𝑗 2019/6/27 15

Output Layer  Output layer ◦ Flatten all utterances vectors ◦ 𝑛𝑚 = 𝑛𝑚 1 , 𝑛𝑚 2 , … , 𝑛𝑚 𝑙 ◦ Apply a fully-connected layer with softmax to output the score distribution as ◦ 𝑔𝑑 = 𝑛𝑚𝑋 𝑔𝑑 + 𝑐 𝑔𝑑 𝑓𝑦𝑞 𝑔𝑑 𝑗 ◦ 𝑄 𝑡𝑑𝑝𝑠𝑓|𝑒𝑗𝑏𝑚𝑝𝑕𝑣𝑓 = 5 σ 𝑗′=1 𝑓𝑦𝑞 𝑔𝑑 𝑗′ ◦ Dimension of 𝑄 𝑡𝑑𝑝𝑠𝑓|𝑣 𝑗 is 1x5 since the scale of scores are -2, -1, 0, 1, 2 2019/6/27 16

Dialogue Quality (DQ) Subtask Model Experiments 2019/6/27 17

Data Customer helpdesk dialogues Data Training Testing # Dialogues 1,672 390 ◦ Annotators: 19 students from Waseda university # Utterances 8,672 1,755 ◦ Validation data is randomly selected 20% from training data Preprocessing ◦ Remove all full-shape characters ◦ Remove all half-shape characters except A-Za-z!"#$%&()*+,-./:;<=>?@[\ ]^_`{|}~ ‘ ◦ Tokenize by NLTK toolkit (Edward Loper and Steven Bird. 2002) 2019/6/27 18

Word Embedding Embedding parameter ◦ Dimension: 100 Data source # words ◦ Tool: genism text8(wiki) 17,005,208 ◦ Method: skip-gram STC-3 DQ&ND 339,410 ◦ Window size: 5 Total 17,344,618 STC-3 DQ&ND data ◦ Customer helpdesk dialogues ◦ Including train data and test data 2019/6/27 19

Hyper parameters of DQ Hyper parameters Value Batch size 40 Epochs 50 Early stopping 3 Optimizer Adam optimizer Learning rate 0.0005 • # convolutional layers: 2 • Multi-stack CNN of UL # Filter: [512, 1024] • Kernel size: 2 & 2 • # convolutional layers: 1 Multi-stack CNN of CL • # Filter: [1024] 2019/6/27 20

Result of DQ Subtask ◦ MeHGCNN : Our proposed model ◦ MeGCBERT : Replace embedding and utterance layer of MeHGCNN with BERT ◦ BL-BERT : Simple BERT model with only BERT and output layer (A-score) (E-score) (S-score) Model NMD RSNOD NMD RSNOD NMD RSNOD BL-uniform 0.1677 0.2478 0.1580 0.2162 0.1987 0.2681 Organizer BL-popularity 0.1855 0.2532 0.1950 0.2774 0.1499 0.2326 baselines BL-lstm 0.0896 0.1320 0.0824 0.1220 0.0838 0.1310 BL-BERT 0.0934 0.1379 0.0881 0.1344 0.0842 0.1337 Ours MeHGCNN 0.0862 0.1307 0.0814 0.1225 0.0787 0.1241 MeGCBERT 0.0823 0.1255 0.0791 0.1202 0.0758 0.1245 2019/6/27 21

Ablation of MeGCBERT for DQ Gating mechanism & Memory enhance Adding Nugget features ◦ Well improve A-score & S-score ◦ Well improve A-score ◦ A little improvement in E-score ◦ A little improvement in E-score (A-score) (E-score) (S-score) Model NMD RSNOD NMD RSNOD NMD RSNOD MeGCBERT 0.0823 0.1255 0.0791 0.1202 0.0758 0.1245 W/o gating mechanism 0.0885 0.1322 0.0813 0.1214 0.0815 0.1289 W/o memory enhance 0.0913 0.1364 0.0808 0.1235 0.0799 0.1273 W/o nugget features 0.0963 0.1388 0.0802 0.1204 0.0774 0.1247 2019/6/27 22

Nugget Detection (ND) Subtask Model Experiments 2019/6/27 24

Hierarchical multi-stack CNN with LSTM (HCNN-LSTM) Embedding layer ◦ 100 dimensions Word2Vec Utterance layer ◦ Apply 3-stack CNN to learn sentence representation Context layer ◦ Apply 2-stack BI-LSTM to learn context information between utterances Output layer ◦ Output the nugget distribution by softmax 2019/6/27 25

Utterance Layer: 3-stack CNN  Utterance layer (UL) ◦ 𝑚 = 1 𝑚 = 𝑥 𝑗,1 , 𝑥 𝑗,2 , … , 𝑥 𝑗,𝑜 ◦ X 𝑗 𝑚 = 𝐷𝑝𝑜𝑤𝐵 X 𝑗 𝑚 ◦ 𝑣𝑚A 𝑗 𝑚 = 𝐷𝑝𝑜𝑤𝐶 X 𝑗 𝑚 ◦ 𝑣𝑚B 𝑗 𝑗𝑔 𝑚 ≤ 3 𝑚 = 𝑣𝑚A 𝑗 𝑚 , 𝑣𝑚B 𝑗 𝑚 ◦ 𝑣𝑚C 𝑗 𝑚←𝑚+1 = 𝑣𝑚C 𝑗 𝑚 ◦ X 𝑗 1x1 𝑚 , 𝑡𝑞𝑓𝑏𝑙𝑓𝑠 ◦ 𝑣𝑚 𝑗 = 𝑛𝑏𝑦𝑞𝑝𝑝𝑚 𝑣𝑚C 𝑗 𝑗 Filter size: 2&3 for convA&convB 2019/6/27 26

Dialogue Quality and Nugget Detection for Short Text Conversation - PowerPoint PPT Presentation

WIDM @ NTCIR-14 STC-3 Task: Dialogue Quality and Nugget Detection for Short Text Conversation (STC-3) based on Hierarchical Multi-Stack Model with Memory Enhance Structure NATIONAL CENTRAL UNIVERSITY, TAOYUAN, TAIWAN AUTHORS : HSIANG-EN CHERNG

New York University 2016 System for KBP Event Nugget: A Deep Learning Approach Thien Huu Nguyen,

Overview of 2015 TAC KBP Event Nugget Tasks Teruko Mitamura Zhengzhong Liu Eduard Hovy

INTRODUCING UNDERCOUNTER FLAKE & NUGGET BIG OPPORTUNITY, SMALL UNIT. The Essential Ice TM

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

WUST at the NTCIR-14 STC-3 Dialogue Quality and Nugget Detection Subtask Ming Yan, Maofu Liu,

Overview of Event Nugget Track TAC KBP 2016 Teruko Mitamura Zhengzhong Liu Eduard Hovy

Presentation Abstracts Host Facility Spotlights Landrys Golden Nugget - Mike Bajek The Golden

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

dialogue notations and design Dialogue Notations and Design Dialogue Notations

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

CMU LTI @ KBP 2015 Event Track Zhengzhong Liu Dheeru Dua Jun Araki Teruko Mitamura Eduard Hovy

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Verification of Deep Learning Systems Xiaowei Huang, University of Liverpool December 25, 2017

Ava: From data to insights through conversations A review by Apaar Shanker DATA ANALYTICS

SMART HOME OVER IRC HAVING A CHAT WITH YOUR TOASTER 1 Motivation In this Lab you will

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Improving Domain Independent Question Parsing with Synthetic Treebanks COLING 2018: LAW-MWE-CxG

C hatti ng w i th I n C hat C hatbots f or Resear ch Purposes I n C hat H ands O n C

Neural Conversational Models Human: What is the purpose of living? Machine: To live forever.

Towards the clouds, together The IaaS framework Andres Steijaert SURFnet GANT cloud activity

Dialogue Quality and Nugget Detection for Short Text Conversation - PowerPoint PPT Presentation

WIDM @ NTCIR-14 STC-3 Task: Dialogue Quality and Nugget Detection for Short Text Conversation (STC-3) based on Hierarchical Multi-Stack Model with Memory Enhance Structure NATIONAL CENTRAL UNIVERSITY, TAOYUAN, TAIWAN AUTHORS : HSIANG-EN CHERNG

New York University 2016 System for KBP Event Nugget: A Deep Learning Approach Thien Huu Nguyen,

Overview of 2015 TAC KBP Event Nugget Tasks Teruko Mitamura Zhengzhong Liu Eduard Hovy

INTRODUCING UNDERCOUNTER FLAKE &amp; NUGGET BIG OPPORTUNITY, SMALL UNIT. The Essential Ice TM

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

WUST at the NTCIR-14 STC-3 Dialogue Quality and Nugget Detection Subtask Ming Yan, Maofu Liu,

Overview of Event Nugget Track TAC KBP 2016 Teruko Mitamura Zhengzhong Liu Eduard Hovy

Presentation Abstracts Host Facility Spotlights Landrys Golden Nugget - Mike Bajek The Golden

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

dialogue notations and design Dialogue Notations and Design Dialogue Notations

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

CMU LTI @ KBP 2015 Event Track Zhengzhong Liu Dheeru Dua Jun Araki Teruko Mitamura Eduard Hovy

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Verification of Deep Learning Systems Xiaowei Huang, University of Liverpool December 25, 2017

Ava: From data to insights through conversations A review by Apaar Shanker DATA ANALYTICS

SMART HOME OVER IRC HAVING A CHAT WITH YOUR TOASTER 1 Motivation In this Lab you will

Chatbot models, NLU &amp; ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Improving Domain Independent Question Parsing with Synthetic Treebanks COLING 2018: LAW-MWE-CxG

C hatti ng w i th I n C hat C hatbots f or Resear ch Purposes I n C hat H ands O n C

Neural Conversational Models Human: What is the purpose of living? Machine: To live forever.

Towards the clouds, together The IaaS framework Andres Steijaert SURFnet GANT cloud activity

INTRODUCING UNDERCOUNTER FLAKE & NUGGET BIG OPPORTUNITY, SMALL UNIT. The Essential Ice TM

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)