MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute - PowerPoint PPT Presentation

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute for Informatics

Motivation Opportunity to evaluate NIR model (participatingin pool) • Previously evaluated on TREC Web Track 09-14 (WSDM '18, EMNLP '17) • With long queries (TREC description) • Re-ranking results from unsupervised model Significant improvement with a strong signal from WSDM '18? How does it compare to BM25 with short queries (& pool)? 2

Outline • Model summary (PACRR & Co-PACRR) • Parameters varied • Experimental setup • Results 3

Input Representation Document bayern Query beats dortmund Query-document similarity matrix • word2vec similarity • One matrix for each document 4

Using Positional Information Document window bayern bayern bayern Query beats beats beats dortmund dortmund dortmund Match patterns (Convolutional kernels) PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17. 5

Using Positional Information Document window bayern bayern bayern Query beats beats beats dortmund dortmund dortmund Partial match Ordered match Reversed ordered match PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17. 6

Using Positional Information bayern bayern beats beats dortmund dortmund Matches are local: consider N x N regions of the matrix PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17. 7

Using Positional Information  bayern beats dortmund ✓  Patterns are exclusive: each region is best matched by a single pattern PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17. 8

PACRR: Position-Aware Convolutional Recurrent Relevance Matching w: kernel (1) CNN kernels capture patterns PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17. 9

PACRR: Position-Aware Convolutional Recurrent Relevance Matching w: kernel 6 7 8 1 2 3 (1) CNN kernels capture patterns Signal for this region: w 1,1 x 1,6 + w 1,2 x 1,7 + w 1,3 x 1,8 + … + w 2,1 x 2,6 + … w 3,3 x 3,8 10

PACRR: Position-Aware Convolutional Recurrent Relevance Matching Best-matching pattern ✓ (1) CNN kernels (2) Max pool Signal: 1.0 capture patterns kernels Signal: 0 Signal: 0.3 11 11

PACRR: Position-Aware Convolutional Recurrent Relevance Matching (1) CNN kernels (2) Max pool (3) K-max pool query capture patterns kernels signals from doc regions K=2 12 12

PACRR: Position-Aware Convolutional Recurrent Relevance Matching (1) CNN kernels (2) Max pool (3) K-max pool query capture patterns kernels signals from doc regions For each query term, we now have: • K-max match signals for unigrams • K-max match signals for bigrams • … • K-max match signals for n-grams 13 13

PACRR: Position-Aware Convolutional Recurrent Relevance Matching (1) CNN kernels (2) Max pool (3) K-max pool query capture patterns kernels signals from doc regions (4) Combination function (FC layers) produce a score for each query term (5) Document score is the summation [Steps 4 & 5 differ from original papers] 14 14

PACRR: Position-Aware Convolutional Related to MatchPyramid, but Recurrent Relevance Matching e.g., different pooling strategies A Study of MatchPyramid Models on Ad-hoc Retrieval . L. Pang, Y. Lan, J. Guo, J. Xu, Z. Cheng. Neu-IR '16 SIGIR Workshop. (1) CNN kernels (2) Max pool (3) K-max pool query capture patterns kernels signals from doc regions (4) Combination function (FC layers) produce a score for each query term (5) Document score is the summation [Steps 4 & 5 differ from original papers] 15 15

Variant: Cascade Pooling • Inspired by cascade model An experimental comparison of click position-bias models . Craswell et al. WSDM '08. • Prefer document with earlier relevant information • One of several improvements in Co-PACRR (WSDM '18) > Document A Document B 16

Variant: Cascade Pooling For each query term, PACRR retains top k match signals • Cascade Pooling: repeat for different document cutoffs • Top k signals from the first 50% of the document • Top k signals from the entire document Query term FC receives match signals from different cutoffs Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval. K Hui, A Yates, K Berberich, G de Melo. In: WSDM '18. 17

Parameters Varied 1. Cascade pooling used? (3 with, 2 without) 2. Size of k -max pooling (top 5 vs. 15) 3. Size of fully connected layers that score query term (2x8 or 1) 18

Experimental Setup • Train on TREC WT09-13 judgments • WT14 and WWW-1 used for validation • Using best weights on WWW-1 (after sanity checking on WT14), re-rank BM25 run provided by organizers 19

Results & Conclusion • No significant improvement between any pair of runs • No significant improvement over BM25 • Given past results, minD >= 0.1 seems large 20

Results & Conclusion • No significant improvement between any pair of runs • No significant improvement over BM25 • Given past results, minD >= 0.1 seems large Recent work building on PACRR (and other NIR models): CEDR: Contextualized Embeddings for Document Ranking. S. MacAvaney, A. Yates, A. Cohan, N. Goharian. SIGIR '19. Thanks! 21

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute - PowerPoint PPT Presentation

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute for Informatics Motivation Opportunity to evaluate NIR model (participatingin pool) Previously evaluated on TREC Web Track 09-14 (WSDM '18, EMNLP '17) With long queries

NTCIR-9 Kick-Off Event ff 2010.10.05 : 13:30- English Session: 15:30-

Quasi-Random Rumor Spreading Benjamin Doerr MPII Saarbrcken joint work with Tobias Friedrich

MPII at the NTCIR-14 CENTRE Task Andrew Yates Max Planck Institute for Informatics Motivation

Neuchatel at NTCIR-4 From CLEF to NTCIR Jacques Savoy University of Neuchatel, Switzerland

I t Introduction to NTCIR-7 d ti t NTCIR 7 N Noriko Kando k K d National Institute of

KSU Teams QA System for World History Exams at the NTCIR-13 QA Lab-3 Task Tasuku Kimura, Ryo

Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for NTCIR-7 Patent Translation Task

Overview of the Sixth NTCIR Workshop Noriko Kando National Institute of Informatics

NTCIR 2014 Slides - TUW-IMP at the NTCIR-11 Math-2 Presentation February 2015 CITATIONS READS

On algebraic branching programs of small width Karl Bringmann Christian Ikenmeyer MPII Saarbr

CUTKB at NTCIR-14 QALab-PoliInfo Task Toshiki Tomihira and Yohei Seki University of Tsukuba,

Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Text Conversation Task

RMIT at the NTCIR-13 We Want Web Task Luke Gallagher with Joel Mackenzie, Rodger Benham,

SG01 at the NTCIR-13 STC-2 task Haizhou Zhao , Yi Du, Hangyu Li, Qiao Qian, Hao Zhou, Minlie

IASL System for NTCIR-6 Korean-Chinese CLIR Yu-Chun Wang Cheng-Wei Lee Richard Tzong-Han Tsai

NiCT/ATR in NTCIR-7 CCLQA Track Youzheng WU, Wenliang CHEN, Hideki KASHIOKA NiCT/ATR, Japan

Visualizing size-security tradeoffs for lattice-based encryption Daniel J. Bernstein Horizontal

Initial objectives Employ plasmonic and geometrical resonances to enhance magneto-optical effects

Microcell Urban Propagation Channel Analysis Using Measurement Data Mir Ghoraishi Jun-ichi

Microphone Array Processing : A Quick Update Iain McCowan Guillaume Lathoud, Darren Moore,

Inductive general game playing Andrew Cropper, Richard Evans, and Mark Law General game playing

Beyond Spectrum Music game project Team member: Class

Outline Morning program Preliminaries Semantic matching Learning to rank Entities Afternoon

Participants 1 Model The part that does the actual computation, data