MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute - - PowerPoint PPT Presentation

mpii at the ntcir 14 2 task
SMART_READER_LITE
LIVE PREVIEW

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute - - PowerPoint PPT Presentation

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute for Informatics Motivation Opportunity to evaluate NIR model (participatingin pool) Previously evaluated on TREC Web Track 09-14 (WSDM '18, EMNLP '17) With long queries


slide-1
SLIDE 1

MPII at the NTCIR-14 WWW-2 Task

Andrew Yates Max Planck Institute for Informatics

slide-2
SLIDE 2

Motivation

Opportunity to evaluate NIR model (participatingin pool)

  • Previously evaluated on TREC Web Track 09-14

(WSDM '18, EMNLP '17)

  • With long queries (TREC description)
  • Re-ranking results from unsupervised model

Significant improvement with a strong signal from WSDM '18? How does it compare to BM25 with short queries (& pool)?

2

slide-3
SLIDE 3

Outline

  • Model summary (PACRR & Co-PACRR)
  • Parameters varied
  • Experimental setup
  • Results

3

slide-4
SLIDE 4

beats dortmund bayern

Query-document similarity matrix

  • word2vec similarity
  • One matrix for each document

4

Input Representation

Query Document

slide-5
SLIDE 5

bayern beats dortmund

Match patterns (Convolutional kernels)

5

Using Positional Information

bayern beats dortmund bayern beats dortmund

Document window Query

PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17.

slide-6
SLIDE 6

bayern beats dortmund

Ordered match

6

Using Positional Information

bayern beats dortmund bayern beats dortmund

Document window Query Partial match Reversed ordered match

PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17.

slide-7
SLIDE 7

Matches are local: consider NxN regions of the matrix

7

Using Positional Information

beats dortmund bayern bayern beats dortmund

PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17.

slide-8
SLIDE 8

Patterns are exclusive: each region is best matched by a single pattern

8

Using Positional Information

 ✓ 

beats dortmund bayern

PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17.

slide-9
SLIDE 9

PACRR: Position-Aware Convolutional Recurrent Relevance Matching

(1) CNN kernels capture patterns

9

w: kernel

PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17.

slide-10
SLIDE 10

PACRR: Position-Aware Convolutional Recurrent Relevance Matching

(1) CNN kernels capture patterns

10

Signal for this region: w1,1 x1,6 + w1,2 x1,7 + w1,3 x1,8 + … + w2,1 x2,6 + … w3,3 x3,8 6 7 8 1 2 3 w: kernel

slide-11
SLIDE 11

PACRR: Position-Aware Convolutional Recurrent Relevance Matching

(1) CNN kernels capture patterns

11

(2) Max pool kernels

11

Best-matching pattern Signal: 1.0 Signal: 0.3 Signal: 0

slide-12
SLIDE 12

PACRR: Position-Aware Convolutional Recurrent Relevance Matching

(1) CNN kernels capture patterns

12

(2) Max pool kernels

12

(3) K-max pool query signals from doc regions K=2

slide-13
SLIDE 13

PACRR: Position-Aware Convolutional Recurrent Relevance Matching

(1) CNN kernels capture patterns

13

(2) Max pool kernels

13

(3) K-max pool query signals from doc regions For each query term, we now have:

  • K-max match signals for unigrams
  • K-max match signals for bigrams
  • K-max match signals for n-grams
slide-14
SLIDE 14

PACRR: Position-Aware Convolutional Recurrent Relevance Matching

(1) CNN kernels capture patterns

14

(2) Max pool kernels

14

(3) K-max pool query signals from doc regions (4) Combination function (FC layers) produce a score for each query term (5) Document score is the summation [Steps 4 & 5 differ from original papers]

slide-15
SLIDE 15

PACRR: Position-Aware Convolutional Recurrent Relevance Matching

(1) CNN kernels capture patterns

15

(2) Max pool kernels

15

(3) K-max pool query signals from doc regions (4) Combination function (FC layers) produce a score for each query term (5) Document score is the summation [Steps 4 & 5 differ from original papers]

Related to MatchPyramid, but e.g., different pooling strategies

A Study of MatchPyramid Models on Ad-hoc Retrieval. L. Pang,

  • Y. Lan, J. Guo, J. Xu, Z. Cheng. Neu-IR '16 SIGIR Workshop.
slide-16
SLIDE 16

Variant: Cascade Pooling

  • Inspired by cascade model

An experimental comparison of click position-bias models. Craswell et al. WSDM '08.

  • Prefer document with earlier relevant information
  • One of several improvements in Co-PACRR (WSDM '18)

16

>

Document A Document B

slide-17
SLIDE 17

Variant: Cascade Pooling

17

For each query term, PACRR retains top k match signals

  • Cascade Pooling: repeat for different document cutoffs
  • Top k signals from the first 50% of the document
  • Top k signals from the entire document

Query term FC receives match signals from different cutoffs

Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval. K Hui, A Yates, K Berberich, G de Melo. In: WSDM '18.

slide-18
SLIDE 18

Parameters Varied

18

  • 1. Cascade pooling used? (3 with, 2 without)
  • 2. Size of k-max pooling (top 5 vs. 15)
  • 3. Size of fully connected layers that score query term (2x8 or 1)
slide-19
SLIDE 19

Experimental Setup

19

  • Train on TREC WT09-13 judgments
  • WT14 and WWW-1 used for validation
  • Using best weights on WWW-1 (after sanity checking on WT14),

re-rank BM25 run provided by organizers

slide-20
SLIDE 20

Results & Conclusion

20

  • No significant improvement between any pair of runs
  • No significant improvement over BM25
  • Given past results, minD >= 0.1 seems large
slide-21
SLIDE 21

Results & Conclusion

21

  • No significant improvement between any pair of runs
  • No significant improvement over BM25
  • Given past results, minD >= 0.1 seems large

Recent work building on PACRR (and other NIR models):

CEDR: Contextualized Embeddings for Document Ranking.

  • S. MacAvaney, A. Yates, A. Cohan, N. Goharian. SIGIR '19.

Thanks!