Outline Morning program Preliminaries Text matching I Text - PowerPoint PPT Presentation

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program Learning to rank Modeling user behavior Generating responses Wrap up 49

Text matching I Supervised text matching Traditional IR data consists of search queries and document collection Ground truth can be based on explicit human judgments or implicit user be- haviour data (e.g., clickthrough rate) 50

Text matching I Lexical vs. Semantic matching Query: united states president Traditional IR models estimate relevance based on lexical matches of query terms in document Representation learning based models garner evidence of relevance from all document terms based on semantic matches with query Both lexical and semantic matching are important and can be modelled with neural networks 51

Outline Morning program Preliminaries Text matching I Semantic matching Lexical matching Lexical and Semantic Duet Text matching II Afternoon program Learning to rank Modeling user behavior Generating responses Wrap up 52

Text matching I Semantic matching Pros ◮ Ability to match synonyms and related words ◮ Robustness to spelling variations ( ≈ 10% of search queries contain spelling errors) ◮ Helps in cases where lexical matching fails Cons ◮ More computationally expensive than lexical matching 53

Text matching I Deep Structured Semantic Model (DSSM) [Huang et al., 2013] Figure 1: Illustration of the DSSM. It uses a DNN to map high-dimensional sparse text features into low-dimensional dense features in a semantic space. T Deep Structured Semantic Model (DSSM) [Huang et al., 2013] The final layer’s neural activities in this DNN form the feature in the semantic space. 54 ( ) “word hashing” for

Text matching I DSSM - Siamese Network 1. Represent query and document as vectors q and d in a latent vector space 2. Estimate the matching degree Figure 1: Illustration of the DSSM. It uses a DNN to map high-dimensional sparse text features into low-dimensional dense features in a semantic space. T Deep Structured Semantic Model between q and d The final layer’s neural activities in this DNN form the feature in the semantic space. (DSSM) [Huang et al., 2013] using cosine similarity We learn to represent queries and documents in the latent vector space by forcing the vector representations (i) for relevant query-document pairs ( q, d + ) to be close in the latent vector space (i.e., cos( q , d + ) → max ); and (ii) for irrelevant query-document pairs ( q , d − ) to be far in the latent vector space (i.e., cos( q , d − ) → min ) ( ) 55 “word hashing” for

Text matching I DSSM - Word hashing How to represent text (e.g., Shinjuku Gyoen)? 1. Bag of Words (BoW) [large vocabulary (500000 words)] { 0, . . . , 0 (apple), 0, . . . , 0, 1 (gyoen), 0, . . . , 0, 1 (shinjuku), 0, . . . , 0 } 2. Bag of Letter Trigrams (BoLT) [small vocabulary (30621 letter 3-grams)] { 0, . . . , 0 (abc), 0, . . . , 1 ( gy), 0, . . . , 0, 1 ( sh), 0, . . . , 0, 1 (en ), 0, . . . , 0, 1 (gyo), 0, . . . , 0, 1 (hin), 0, . . . , 0, 1 (inj), 0, . . . , 0, 1 (juk), 0, . . . , 0, 1 (ku ), 0, . . . , 0, 1 (oen), 0, . . . , 0, 1 (shi), 0, . . . , 0, 1 (uku), 0, . . . , 0, 1 (yoe), 0 } 56

Text matching I DSSM - Architecture x = BoW ( text ) l 1 = WordHashing ( x ) l 2 = tanh( W 2 l 1 + b 2 ) l 3 = tanh( W 3 l 2 + b 3 ) l 4 = tanh( W 4 l 3 + b 4 ) Figure 1: Illustration of the DSSM. It uses a DNN to map high-dimensional sparse text features into low-dimensional dense features in a semantic space. T The final layer’s neural activities in this DNN form the feature in the semantic space. 57 ( ) “word hashing” for

Text matching I DSSM - Training objective Likelihood P ( d + | q ) → max � ( q,d + ) ∈ DATA Figure 1: Illustration of the DSSM. It uses a DNN to map high-dimensional sparse text features into low-dimensional dense features in a semantic space. T The final layer’s neural activities in this DNN form the feature in the semantic space. e γ cos( q , d + ) e γ cos( q , d + ) P ( d + | q ) = d ∈ D e γ cos( q , d ) ≈ d ∈ D + ∪ D − e γ cos( q , d ) � � ( ) 58 “word hashing” for

Text matching I DSSM - Results NDCG Model @1 @3 @10 TF-IDF 0.319 0.382 0.462 BM25 0.308 0.373 0.455 WTM 0.332 0.400 0.478 LSA 0.298 0.372 0.455 PLSA 0.295 0.371 0.456 DAE 0.310 0.377 0.459 Figure 1: Illustration of the DSSM. It uses a DNN to map high-dimensional sparse text features into low-dimensional dense features in a semantic space. T BLTM 0.337 0.403 0.480 The final layer’s neural activities in this DNN form the feature in the semantic space. DPM 0.329 0.401 0.479 DSSM 0.362 0.425 0.498 59 ( ) “word hashing” for

� � � � � � � � � Text matching I CLSM 1. Embeds N-grams similar to DSSM 2. Aggregates phrase embeddings by max-pooling NDCG Model @1 @3 @10 BM25 0.305 0.328 0.388 DSSM 0.320 0.355 0.431 CLSM 0.342 0.374 0.447 A Latent Semantic Model with Convolutional-Pooling Structure W � for Information Retrieval [Shen et al., 2014]. � � 60

Text matching I In industry Baidu’s DNN model ◮ Around 30% of total 2013, 2014 relevance improvement ◮ Use 10B clicks for training (more than 100M parameters) Pairwise ranking loss � > Output - 1*1 � Output - 1*1 � Hidden2 - h’*1 � Hidden2 - h’*1 � Hidden1 - h*1 � Hidden1 - h*1 � Query - s*1 � Title1 - s*1 � Query - s*1 � Title2 - s*1 � Looking up Table Looking up Table s*||V|| � s*||V|| � Query term - ||V||*1 � Title1 term - ||V||*1 � Query term - ||V||*1 � Title2 term - ||V||*1 � Query Clicked_title Query Not_clicked_title 61

Text matching I Semantic matching for long text Semantic matching can also be applied to long text retrieval but requires large scale training data to learn meaningful representations of text Mitra et al. [2017] train on large manually Dehghani et al. [2017] train on labelled data from Bing pseudo labels (e.g., BM25) 62

Text matching I Interaction matrix based approaches Alternative to Siamese networks Interaction matrix X , where x i,j is document obtained by comparing the the i th word in source sentence with j th word in target sentence query Comparisons can be both lexical or neural network semantic interaction matrix E.g., Hu et al. [2014], Mitra et al. [2017], Pang et al. [2016] 63

Text matching I Lexical matching Query: “rosario trainer” The rare term “rosario” may have never been seen during training and unlikely to have meaningful representation But the patterns of lexical matches of rare terms in document may be very informative for estimating relevance 65

Text matching I Lexical matching Guo et al. [2016] train a DNN model Mitra et al. [2017] convolve over the using features derived from frequency binary interaction matrix to learn histograms of query term matches in interesting patterns of lexical term document matches 66

Text matching I Duet Jointly train two sub-networks focused on lexical and semantic matching [Mitra et al., 2017, Nanni et al., 2017] Training sample: q, d + , d 1 , d 2 , d 3 , d 4 e ndrm ( q,d + ) p ( d + | q ) = (1) d ∈ D − e ndrm ( q,d ) � Implementation on GitHub: https://github.com/bmitra- msft/NDRM/blob/master/notebooks/Duet.ipynb 68

Text matching I Distributed model 69

Text matching I 70

Text matching I Duet The biggest impact of training data size is on the performance of the representation learning sub-model Important: if you want to learn effective representations for semantic matching you need large scale training data! 71

Text matching I Duet 72

Text matching I Duet 73

Text matching I Duet If we classify models by query level performance there is a clear clustering of lexical and semantic matching models 74

Outline Morning program Preliminaries Text matching I Text - PowerPoint PPT Presentation

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program Learning to rank Modeling user behavior Generating responses Wrap up 49 Text matching I Supervised text matching Traditional IR data consists of

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

15-112 Fundamentals of Programming Lecture 1: Introduction + Basic Building Blocks of Programming

By Shervin Daneshpajouh Syllabus Syllabus See the syllabus file 2 Shervin Daneshpajouh

Basics of Machine Learning and Deep Learning (Part I) Machine Learning Tom Mitchell: An

1 Bertrand Meyer About me 7 8 At ETH for four years In industry for most of my career

Semantics with Failures If map and reduce are deterministic, then output identical to

New Review Process November 18, 2016 New review process: high-level overview Each semester:

Virtual Promenade: A New Serious Game for the Rehabilitation of Older Adults with Post-fall

REPORT ON THE INTRODUCTION OF DISTANCE LEARNING IN TRADITIONAL HIGHER MUSIC EDUCATION. Thomas