Query Operations Query Operations Berlin Chen 2004 Reference: 1. - PowerPoint PPT Presentation

Query Operations Query Operations Berlin Chen 2004 Reference: 1. Modern Information Retrieval . chapter 5

Introduction • Users have no detailed knowledge of – The collection makeup Difficult to formulate queries – The retrieval environment • Scenario of (Web) IR 1. An initial (naive) query posed to retrieve relevant docs 2. Docs retrieved are examined for relevance and a new improved query formulation is constructed and posed again Expand the original query with new terms ( query expansion ) and rewight the terms in the expanded query ( term weighting ) 2

Query Reformulation • Approaches through query expansion (QE) and terming weighting – Feedback information from the user • Relevance feedback – With vector, probabilistic models et al. – Information derived from the set of documents initially retrieved (called local set of documents) • Local analysis – Local clustering, local context analysis – Global information derived from document collection • Global analysis – Similar thesaurus or statistical thesaurus 3

Relevance Feedback • User (or Automatic ) Relevance Feedback – The most popular query reformation strategy • Process for user relevance feedback – A list of retrieved docs is presented – User or system exam them (e.g. the top 10 or 20 docs) and marked the relevant ones – Important terms are selected from the docs marked as relevant, and the importance of them are enhanced in the new query formulation relevant docs irrelevant docs query 4

User Relevance Feedback • Advantages – Shield users from details of query reformulation • User only have to provide a relevance judgment on docs – Break down the whole searching task into a sequence of small steps – Provide a controlled process designed to emphasize some terms (relevant ones) and de-emphasize others (non-relevant ones) For automatic relevance feedback , the whole process is done in an implicit manner 5

Query Expansion and Term Reweighting for the Vector Model • Assumptions – Relevant docs have term-weight vectors that resemble each other – Non-relevant docs have term-weight vectors which are dissimilar from the ones for the relevant docs – The reformulated query gets to closer to the term- weight vector space of relevant docs relevant docs irrelevant docs query term-weight vectors 6

Query Expansion and Term Reweighting for the Vector Model (cont.) • Terminology Answer Set Relevant Docs C r D n D r Non-relevant Docs Relevant Docs identified by the user identified by the user Doc Collection with size N 7

Query Expansion and Term Reweighting for the Vector Model (cont.) • Optimal Condition – The complete set of relevant docs C r to a given query q is known in advance r r 1 1 r ∑ ∑ = − q d d − opt i j | C | N | C | r r ∀ ∈ ∀ ∉ r d C r d C i r j r – Problem : the complete set of relevant docs C r are not known a priori • Solution : formulate an initial query and incrementally change the initial query vector based on the known relevant/non-relevant docs – User or automatic judgments 8

Query Expansion and Term Reweighting for the Vector Model (cont.) • In Practice 1. Standard_Rocchio Rocchio 1965 r r β γ ∑ ∑ r r = α ⋅ + ⋅ − ⋅ q q d d m i j modified query | D | | D | r r ∀ ∈ ∀ ∈ r d Dr n d Dn i j initial/original query 2. Ide_Regular r r r r ∑ ∑ = α ⋅ + β ⋅ − γ ⋅ q q d d m i j r r ∀ ∈ ∀ ∈ d Dr d Dn i j 3. Ide_Dec_Hi The highest ranked non-relevant doc ( ) r r ∑ r r = α ⋅ + β ⋅ − γ ⋅ q q d max d − m i non relevant j r ∀ ∈ d Dr i 9

Query Expansion and Term Reweighting for the Vector Model (cont.) • Some Observations – Similar results were achieved for the above three approach (Dec-Hi slightly better in the past ) – Usually, constant β is bigger than γ (why?) • In Practice (cont.) – More about the constants • Rocchio, 1971: α =1 • Ide, 1971: α = β = γ =1 • Positive feedback strategy : γ =0 10

Query Expansion and Term Reweighting for the Vector Model (cont.) • Advantages – Simple, good results • Modified term weights are computed directly from the retrieved docs • Disadvantages – No optimality criterion • Empirical and heuristic query 11

Term Reweighting for the Probabilistic Model Roberston & Sparck Jones 1976 • Similarity Measure ) ∑ ⎡ ⎤ − ( t P ( k | R ) 1 P ( k | R ) ≈ × × + sim d , q w w log i log i ⎢ ⎥ − j i , q i , j 1 P ( k | R ) P ( k | R ) ⎣ ⎦ = i 1 i i prob. of observing term k i in the set of relevant docs Binary weights (0 or 1) are used • Initial Search (with some assumptions) = P ( k | R ) 0 . 5 – :is constant for all indexing terms i n = P ( k | R ) i – :approx. by doc freq. of index terms i N ⎡ ⎤ n − 1 i ⎢ ⎥ ( ) t 0 . 5 ∑ N ≈ × × + sim d , q w w log log ⎢ ⎥ j i , q i , j n − 1 0 . 5 ⎢ ⎥ = i i 1 ⎣ ⎦ N − t N n ∑ = × × w w log i i , q i , j n 12 = i 1 i

Term Reweighting for the Probabilistic Model (cont.) • Relevance feedback (term reweighting alone) + D 0 . 5 Relevant docs D = r , i Approach 1 P ( k | R ) i + r , i = D 1 P ( k | R ) containing term k i r − + n D 0 . 5 i D = i r , i Relevant docs P ( k | R ) r i − + N D 1 r − n D n + D i i r , i = P ( k | R ) r , i N = P ( k | R ) i − N D + i D 1 r n r Approach 2 − + n D i i r , i N = P ( k | R ) i − + N D 1 r ⎡ ⎤ − D n D r , i − i r , i ⎢ ⎥ 1 − D N D ( ) t ⎢ ⎥ ∑ ≈ × × r + r sim d , q w w log log ⎢ ⎥ j i , q i , j − D n D ⎢ ⎥ = i 1 − r , i i r , i 1 ⎢ ⎥ − D N D ⎣ ⎦ r r ⎡ ⎤ − − + D N D n D t ∑ r , i r i r , i = × × ⎢ ⋅ ⎥ w w log i , q i , j − − ⎢ D D n D ⎥ ⎣ ⎦ = i 1 r r , i i r , i 13

Term Reweighting for the Probabilistic Model (cont.) • Advantages – Feedback process is directly related to the derivation of new weights for query terms – The term reweighting is optimal under the assumptions of term independence and binary doc indexing • Disadvantages – Document term weights are not taken into considered – Weights of terms in previous query formulations are disregarded – No query expansion is used • The same set of index terms in the original query is reweighted over and over again 14

A Variant of Probabilistic Term Reweighting Croft 1983 • Differences http://ciir.cs.umass.edu/ – Distinct initial search assumptions – Within-document frequency weight included • Initial search (assumptions) t ∑ ∝ sim ( d , q ) w w F j i , q i , j i , j , q = i 1 f = + + i , j = + f K ( 1 K ) F ( C idf ) f i , j max( f ) i , j , q i i , j i , j ~ Inversed document frequency ~ Term frequency (normalized with the maximum within-document frequency) • C and K are adjusted with respect to the doc collection 15

A Variant of Probabilistic Term Reweighting (cont.) • Advantages – The within-doc frequencies are considered – A normalized version of these frequencies is adopted – Constants C and K are introduced for greater flexibility • Disadvantages – More complex formulation – No query expansion (just reweighting of index terms) 17

Evaluation of Relevance Feedback Strategies • Recall-precision figures of user reference feedback is unrealistic – Since the user has seen the docs during reference feedback • A significant part of the improvement results from the higher ranks assigned to the set R of docs r r β γ r r ∑ ∑ = α ⋅ + ⋅ − ⋅ q q d d m i j | D | | D | r r ∀ ∈ r n d Dn ∀ ∈ j d Dr i original query modified query – The real gains in retrieval performance should be measured based on the docs not seen by the user yet 18

Evaluation of Relevance Feedback Strategies (cont.) • Recall-precision figures relative to the residual collection – Residual collection • The set of all docs minus the set of feedback docs provided by the user – Evaluate the retrieval performance of the modified query q m considering only the residual collection – The recall-precision figures for q m tend to be lower than the figures for the original query q • It’s OK ! If we just want to compare the performance of different relevance feedback strategies 19

Query Operations Query Operations Berlin Chen 2004 Reference: 1. - PowerPoint PPT Presentation

Query Operations Query Operations Berlin Chen 2004 Reference: 1. Modern Information Retrieval . chapter 5 Introduction Users have no detailed knowledge of The collection makeup Difficult to formulate queries The retrieval

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query

Query Execuon Declarave Query (SQL) We start from

Database Systems II Query Compiler CMPT 454, Simon Fraser University, Fall 2009, Martin Ester

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

Analysis and performance of morphological query expansion and language-filtering words on Basque

Relevance Feedback and Query Expansion Debapriyo Majumdar

Query Expansion & Passage Reranking NLP Systems & Applications LING 573 April 17, 2014

Welcome Dr Bob Brown Chair of CNEN Trustee of the Queens Nursing Institute Director of

Going Beyond the Document-Query Lexical Match Oren Kurland Faculty of Industrial Engineering and

RelevanceFeedback CISC489/689010,Lecture#15 Monday,April13 th

SLWWW at the NTCIR-13 WWW Task Peng XIAO , Yimeng FAN , Lingtao Li, Tetsuya Sakai Waseda

Announcements Coordinating with other presenters Presentation length: ~20 minutes HW1

Query Operations Query Operations Berlin Chen 2004 Reference: 1. - PowerPoint PPT Presentation

Query Operations Query Operations Berlin Chen 2004 Reference: 1. Modern Information Retrieval . chapter 5 Introduction Users have no detailed knowledge of The collection makeup Difficult to formulate queries The retrieval

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Information Retrieval &gt; Query Us User er Query Words Query Words Search Personalization

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query

Query Execu*on Declara*ve Query (SQL) We start from

Database Systems II Query Compiler CMPT 454, Simon Fraser University, Fall 2009, Martin Ester

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

Analysis and performance of morphological query expansion and language-filtering words on Basque

Relevance Feedback and Query Expansion Debapriyo Majumdar

Query Expansion &amp; Passage Reranking NLP Systems &amp; Applications LING 573 April 17, 2014

Welcome Dr Bob Brown Chair of CNEN Trustee of the Queens Nursing Institute Director of

Going Beyond the Document-Query Lexical Match Oren Kurland Faculty of Industrial Engineering and

RelevanceFeedback CISC489/689010,Lecture#15 Monday,April13 th

SLWWW at the NTCIR-13 WWW Task Peng XIAO , Yimeng FAN , Lingtao Li, Tetsuya Sakai Waseda

Announcements Coordinating with other presenters Presentation length: ~20 minutes HW1

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

Query Execuon Declarave Query (SQL) We start from

Query Expansion & Passage Reranking NLP Systems & Applications LING 573 April 17, 2014