Nders at NTCIR-13 Short Text Conversation Han Ni, Liansheng Lin, Ge - PowerPoint PPT Presentation

Nders at NTCIR-13 Short Text Conversation Han Ni, Liansheng Lin, Ge Xu NetDragon Websoft Inc. Dec. 2017

System Architecture Figure 1: System Architecture 1

Preprocessing • Traditional-Simplifjed Chinese conversion • Convert Full-width characters into half-width ones • Word segmentation (PKU standard) • Replace number, time, url with token <_NUM>, <_TIME>, <_URL> respectively • Filter meaningless words and special symbols 2

Short Text ID test-post-10440 去到美國，还是吃中餐！宮保雞丁家的感覺～ Raw Text Go to the USA, still eat Chinese food, Kung Pao Chicken, feeling like at home 去到美國 , 还是吃中餐 ! 宮保雞丁家的感覺 ˜ Without T-S Conversion 去到美国 , 还是吃中餐 ! 宫保鸡丁家的感觉 ˜ With T-S Conversion 去到美国还是吃中餐宫保鸡丁家的感觉 Clean Result Short Text ID test-post-10640 汶川大地震 9 周年： 29 个让人泪流满面的瞬间。 Raw Text 9th Anniversary of Wenchuan Earthquake: 29 moments making people tearful 汶川大地震 9 周年 : 29 个让人泪流满面的瞬间。 Without token replacement 汶川大地震 < NUM > 周年 : < NUM > 个让人泪流满面的瞬间。 With token replacement 汶川大地震 < NUM > 周年 < NUM > 个让人泪流满面的瞬间 Clean Result

Similarity Features • TF-IDF • LSA (Latent Semantic Analysis) • LDA (Latent Dirichlet Allocation) • Word2Vec (skip-gram) • LSTM-Sen2Vec We combine each post with its corresponding comments to be a document, then train LSA and LDA models on these documents. 3

LSTM (4) Zaremba, Wojciech, I. Sutskever, and O. Vinyals. Recurrent Neural Network Regularization. Eprint Arxiv (2014). Mikolov, Toma’s. Statistical Language Models Based on Neural Networks. Ph.D. thesis, Brno University of Technology.(2012) Figure 2: The LSTM Cell (6) (5) C t (3) (2) (1) 4 f t = σ ( W f · [ h t − 1 , x t ] + b f ) i t = σ ( W i · [ h t − 1 , x t ] + b i ) ˜ C t = tanh ( W C · [ h t − 1 , x t ] + b C ) C t = f t ∗ C t − 1 + i t ∗ ˜ o t = σ ( W o · [ h t − 1 , x t ] + b o ) h t = o t ∗ tanh ( C t )

Attention weight Figure 3: Unidirectional weight distribution Figure 4: bidirectional weight distribution 5

LSTM-Sen2Vec Figure 5: The Unidirectional LSTM Figure 6: The Traditional Bidirectional LSTM 6

LSTM-Sen2Vec Figure 7: The Modifjed Bidirectional LSTM 7

Candidates Generation • Similar Posts Score 1 (7) Score 2 (8) • Comment Candidates Score 1 (9) Score 2 (10) 8 q , p ( q , p ) = Sim LDA ( q , p ) ∗ Sim W 2 V ( q , p ) ∗ Sim LSTM ( q , p ) q , p ( q , p ) = Sim LSA ( q , p ) ∗ Sim W 2 V ( q , p ) ∗ Sim LSTM ( q , p ) q , c ( q , c ) = Sim LSA ( q , c ) ∗ Sim W 2 V ( q , c ) q , c ( q , c ) = Sim LDA ( q , c ) ∗ Sim W 2 V ( q , c )

Ranking • TextRank (Words as vertices) • Pattern-IDF • Pattern-IDF + TextRank (Sentences as vertices) 9

TextRank - A graph-based ranking model vertices that linked with it. The score of a vertex Vi is defjne as follow: (11) Where d is a damping factor 1 that is usually set to 0.85. 1 Brin, Sergey, and L. Page. The anatomy of a large-scale hypertextual Web search engine. International Conference on World Wide Web Elsevier Science Publishers B. V. 1998:107-117. 10 Formally, let G = ( V ; E ) be a undirected graph with the set of vertices V and and set of edges E , where E is a subset of V × V . For a given V i , let link ( V i ) be the set of ∑ WS ( V i ) = ( 1 − d ) + d ∗ w ij ∗ WS ( V j ) j ∈ link ( V i )

TextRank - Vertices and Edges • Vertices: each unique word in candidates • Edges: a co-occurrence relation • Weighted by: word2vec similarity between two words and the number of their cooccurrences 11

TextRank - Calculate Iteratively . M 1 k M 21 M 22 M 2 k . . . . . . M 12 . . ... . . . M k 1 M k 2 M k 3 M kk M 13 M 23 M 11 12 For N candidates, k words in total, we construct k × k matrix M . M ij = cnt ∗ sim ( D i , D j ) . Then we compute iteratively   . . .   ( 1 − d ) / k . . .   ( 1 − d ) / k     R ( t + 1 ) =  + d R ( t )      . . . . . . . . .       ( 1 − d ) / k . . . Stop when | R ( t + 1 ) − R ( t ) | < ϵ , ϵ = 10 − 7 . Here, cnt refers to the number of co-ocurrences within a sentence for D i and D j .

TextRank - Ranking comment candidate c is calculated as: (12) 13 Since we get the score R ( D i ) for each word D i in candidates, the score for each ∑ D i ∈ c R ( D i ) Rank TextRank ( c ) = len ( c ) Here, len ( c ) refers to the number of words in comment c.

Pattern-IDF the post, we defjne ( D j , D i ) as a pattern. Inspired by the IDF, we calculate the Pattern-IDF as: (13) eliminated. 14 For word D i (minor word) in corresponding comment given word D j (major word) in count c ( D i ) ∗ count p ( D j ) PI ( D i | D j ) = 1 / log 2 count pair ( D i , D j ) Here count c refers to the number of occurrence in comments, count p in posts, count pair in post-comment pair. The PI whose count pair ( D i , D j ) less than 3 are

Pattern-IDF Figure 8: log(X) Figure 9: 1/log(x) 15 Let X = count c ( D i ) ∗ count p ( D j ) , then X ∈ [ 1 , ∞ ) . count pair ( D i , D j )

PI - Example 1.083438 0.026346 Table 1: The example of Pattern-IDF MajorWord H 眼病 (eye disease) 0.889971 丰收年 (harvest year) 0.988191 血浆 (plasma) 1.033668 脊椎动物 (vertebrate) 水粉画（ gouache painting ） 1.180993 中国移动 ... ... 现在 (now) 9.767768 什么 (what) 10.219045 是 (be) 10.934950 (14) n (15) 的 (of) Table 2: The entropy of Pattern-IDF for each Major Word 0.027642 0.062408 MajorWord MinorWord PI 中国移动 (China Mobile) 接通 (connect) 0.071725 中国移动 cmcc 是 (be) 中国移动资费 (charges) 0.067261 中国移动 ... 中国移动 0.028889 我 (me) 营业厅 (business hall ) 中国移动 16 ... ... 0.059234 漫游 (roamimg) 中国移动 0.059949 PI ( D i | D j ) PI norm ( D i | D j ) = ∑ n i = 1 PI ( D i | D j ) ∑ H ( D j ) = − PI norm ( D i | D j ) log 2 PI norm ( D i | D j ) i = 1

PI - Ranking For each comment c in candidates, given a query (new post) q , we calculate the score by PI as follow: (16) Then we defjne rank score as follow: (17) 17 ∑ ∑ D i ∈ c PI ( D i | D j ) D j ∈ q Score PI ( q , c ) = len ( c ) ∗ len ( q ) Score PI ( q , c ) Rank PI = ( 1 + max Score PI ( q , c )) ∗ Sim W 2 V ( q , c ) ∗ Sim LSA ( q , c )

TextRank + Pattern-IDF In this method, We add each comment sentence in candidates as a vertex in the graph and use sentence Word2Vec similarity as edges between vertices in the graph. candidates. And each entry of P is defjned as the score of Pattern-IDF between the (18) 18 For N candidates, we construct N × N matrix M . M ij = Sim w 2 v ( candidate i , candidate j ) . At time t = 0 , We initiate a N-dimension vector P , here N is the number of comment query (new post) q and corresponding comment c i in candidates: P i = Score PI ( q , c i )

TextRank + Pattern-IDF . M 21 M 22 Then we compute iteratively M 2 N . . . . . . M 13 . . ... . . . M N 1 M N 2 M N 3 M NN M 1 N M 23 M 12 19 M 11   . . .   ( 1 − d ) / N . . .   ( 1 − d ) / N     R ( t + 1 ) =  + d R ( t )     . . . . . . . . .        ( 1 − d ) / N . . . Stop when | R ( t + 1 ) − R ( t ) | < ϵ , ϵ = 10 − 7 Finally, we get the score P i for each comment in candidates.

Experiment • Nders-C-R5: LDA + Word2Vec + LSTM-Sen2Vec • Nders-C-R4: LSA + Word2Vec + LSTM-Sen2Vec • Nders-C-R3: R4 + TextRank (Words as vertices) • Nders-C-R2: R4 + Pattern-IDF • Nders-C-R1: R4 + Pattern-IDF + TextRank (Sentences as vertices) 20

Offjcial Result Table 3: The offjcial results of fjve runs for Nders team R2 vs. R4 0.5868 0.5495 0.4550 Nders-C-R5 0.5809 0.5338 0.4780 Nders-C-R4 0.5768 0.5317 0.4647 Nders-C-R3 0.5882 Run Mean nG@1 Mean P+ Mean nERR@10 Nders-C-R1 0.4593 0.5394 0.5805 Nders-C-R2 0.4743 0.5497 21 ↓ 0.77% ↑ 2.98% ↑ 1.26%

Questions? 21

Nders at NTCIR-13 Short Text Conversation Han Ni, Liansheng Lin, Ge - PowerPoint PPT Presentation

Nders at NTCIR-13 Short Text Conversation Han Ni, Liansheng Lin, Ge Xu NetDragon Websoft Inc. Dec. 2017 System Architecture Figure 1: System Architecture 1 Preprocessing Traditional-Simplifjed Chinese conversion Convert Full-width

Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Text Conversation Task

NTCIR-9 Kick-Off Event ff 2010.10.05 : 13:30- English Session: 15:30-

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Neuchatel at NTCIR-4 From CLEF to NTCIR Jacques Savoy University of Neuchatel, Switzerland

I t Introduction to NTCIR-7 d ti t NTCIR 7 N Noriko Kando k K d National Institute of

IMTKU Emotional Dialogue System for Short Text Conversation at NTCIR-14 STC-3 (CECG) Task

Overview of the Sixth NTCIR Workshop Noriko Kando National Institute of Informatics

NTCIR 2014 Slides - TUW-IMP at the NTCIR-11 Math-2 Presentation February 2015 CITATIONS READS

KSU Teams QA System for World History Exams at the NTCIR-13 QA Lab-3 Task Tasuku Kimura, Ryo

Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for NTCIR-7 Patent Translation Task

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Dialogue Quality and Nugget Detection for Short Text Conversation (STC-3) based on Hierarchical

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

A guide to real time weather in computer games Krsito Mnna Intent of this presentation

Module 8 Using ABBYY: Practice Uwe Springmann Centrum fr Informations- und Sprachverarbeitung

Introduction to English Linguistics 8: Indo-European and Germanic Cognates Sanskrit Latin

Assessment of Major Systems Containment S. Michael Modro Joint IAEA-ICTP Essential Knowledge

BOOK LIST Transition Year 2020/21 RELIGION: Retain Jerusalem Bible A4 Envelope Plastic Wallet

Programming of Interactive Systems Anastasia.Bezerianos@lri.fr 1 A.Bezerianos - Intro ProgIS -

ENUMERATION OF POLYOMINOES INSCRIBED IN A RECTANGLE Alain Goupil, Hugo Cloutier, Fathallah Nouboud

Bijective enumeration of permutations starting with a longest increasing subsequence Greta Panova