Comparing weighting models for monolingual information retrieval - PowerPoint PPT Presentation

Comparing weighting models for monolingual information retrieval Gianni Amati, Claudio Carpineto, and Gianni Romano Gianni Amati, Claudio Carpineto, and Gianni Romano Fondazione Ugo Bordoni Fondazione Ugo Bordoni Roma Roma romano@fub.it romano@fub.it

Overview Overview • Three weighting models Three weighting models • • Retrieval feedback Retrieval feedback • • Experimental settings Experimental settings • • Results Results • • Conclusions Conclusions •

Document ranking Sim(q,d) = ∑ w t,q ¥ w t,d t Œ q « d w t,q query term weight q query w t,d doc term weight d document t term

Okapi (k 3 + 1) ¥ f t,q D - n t + 0.5 log 2 ¥ w t,q = k 3 + f t,q n t + 0.5 (k 1 + 1) ¥ f t,d w t,d = W d k 1 ¥ { (1 - b) + b ¥ } + f t avr_W d

Statistical Language Modeling (SLR) w t,q = f t,q w t,d = m log 2 f t,d + ml t log 2 l t + log 2 + W d + m W d + m m + W q ¥ log 2 W d + m

Deviation from randomness (DFR) w t,q = f t,q log 2 (1 + l t ) + f t,d ¥ log 2 1 + l t { } * w t,d = l t f t + 1 ¥ * n t ¥ (f t,d + 1) c ¥ avr_W d * f t,d = f t,d ¥ log 2 (1 + ) W d

Inverted Ranking File Select top D docs + Compute s (w ) norm Select top E terms Weighted Query Form. Docs Query Query Expansion

Document ranking Sim(q,d) = ∑ w t,q ¥ w t,d t Œ q « d w t,q query term weight q query w t,d doc term weight d document t term

Retrieval feedback Sim(q exp , d) = ∑ w t,q_exp ¥ w t,d t Œ q_exp « d w t,q KLD t,d w t,q_exp = + b ¥ a ¥ max q w t,q max d KLD t,d f t,d KLD t,d = f t,d ¥ log 2 f t

Test Collections French, Italian, Spanish monolingual Query title + description Stemming Porter algorithms (snowball.tartarus.org) Stop list Savoy

French AvPrec Prec-at-5 Prec-at-10 SLM 0.4753 0.4538 0.3635 SLM+RF 0.4372 0.4192 0.3462 Okapi 0.5030 0.4385 0.3654 Okapi+RF 0.5054 0.4769 0.3942 DFR 0.5116 0.4577 0.3654 DFR+RF 0.5238 0.4885 0.3981

Italian AvPrec Prec-at-5 Prec-at-10 SLM 0.5027 0.4941 0.3824 SLM+RF 0.5095 0.4824 0.3863 Okapi 0.4762 0.4588 0.3510 Okapi+RF 0.5238 0.4824 0.3902 DFR 0.5046 0.4824 0.3725 DFR+RF 0.5364 0.5255 0.4137

Spanish AvPrec Prec-at-5 Prec-at-10 SLM 0.4720 0.6140 0.5175 SLM+RF 0.5112 0.5825 0.5316 Okapi 0.4606 0.5684 0.5175 Okapi+RF 0.5093 0.6105 0.5491 DFR 0.4907 0.6035 0.5386 DFR+RF 0.5510 0.6140 0.5825

French AvPrec variation 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 141 143 145 148 150 152 154 156 158 162 164 167 170 173 175 177 179 181 183 185 187 189 192 195 197 199

Italian AvPrec variation 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 141 143 147 149 151 153 155 157 161 163 165 167 171 174 177 179 181 183 185 187 189 192 194 196 198 200

Spanish AvPrec variation 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 141 143 145 147 149 151 153 155 157 159 161 163 165 167 170 172 174 176 178 180 182 184 186 189 191 193 196 198 200

Average delta AvPrec delta max best French 0.2047 0.5796 0.5238 Italian 0.1596 0.5978 0.5364 Spanish 0.1050 0.5732 0.5510

Ranked performance French Italian Spanish 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd SLM 11 11 30 10 9 32 16 10 31 Okapi 20 17 15 21 16 14 16 22 19 DFR 21 24 7 20 26 5 25 25 7

Conclusions • DFR > Okapi, SLM • Retrieval feedback mostly effective • Performance mostly language independent Future experiments with a wide range of factors: query length, model parameters, expansion parameters

Comparing weighting models for monolingual information retrieval - PowerPoint PPT Presentation

Comparing weighting models for monolingual information retrieval Gianni Amati, Claudio Carpineto, and Gianni Romano Gianni Amati, Claudio Carpineto, and Gianni Romano Fondazione Ugo Bordoni Fondazione Ugo Bordoni Roma Roma romano@fub.it

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Using Monolingual Source-Side In-Domain Data Jen Drexler, Pamela Shapiro, Xuan Zhang SCALE

Web Information Retrieval Lecture 5 Field Search, Weighting Plan Last lecture Dictionary

Scoring, term weighting, the vector space model Giorgio Gambosi Course of Information Retrieval

NPFL103: Information Retrieval (4) Ranked retrieval, Term weighting, Vector space model Pavel

Climate: What Is It Anyway Comparing Weather and Climate Climate Regions and Biomes Comparing

A Family of Fuzzy Orthogonal Projection Models for Monolingual and Cross-lingual Hypernymy

Light-weighting Gravity Cast Parts in the Automotive Industry Jack Strong Research Manager

Grades Weighting STAAR& Grad Plans Did You Know?? There are 2 types of grades MAJOR 1.

FAANG+ holdings in S&P 500 & MSCI EM Index S&P 500 Index Weighting 20% MSCI EM Index

Course Weighting Todd Winch, Assistant Superintendent for Curriculum & Instruction

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Cross-Lingual Information Retrieval Language Technology I Language Technology I Crosslingual

Monolingual Transduction Graham Neubig Site https://phontron.com/class/mtandseq2seq2019/

Papillon Lexical Database Project Monolingual Dictionaries & Interlingual Links Mathieu

POSTECH at NTCIR-4: CJKE Monolingual and Korean-related Cross-Language Retrieval Experiments

X. The Sermon on the Mount Matthew 5 7, Luke 6:17 42 A. Introduction 1. The Sermon on

Where your treasure is, there will your heart be also Matthew 6:21 NIV Bible For you

Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval

The Spiritual Foundations of A New Society a Theosophical interpretation A talk at the United

2/10/20 L.O To use Roman numerals -I can think of examples where Roman numerals will be used

HIBE with Tight Multi-challenge Security Roman Langrehr ETH Zurich (Switzerland), Part of the

Romans Series Lesson #59 May 3, 2012 Dean Bible Ministries www.deanbible.org Dr. Robert L.

A Lee-Wick Extension of the Standard Model Benjamin Grinstein Indirect Searches for New Physics

Comparing weighting models for monolingual information retrieval - PowerPoint PPT Presentation

Comparing weighting models for monolingual information retrieval Gianni Amati, Claudio Carpineto, and Gianni Romano Gianni Amati, Claudio Carpineto, and Gianni Romano Fondazione Ugo Bordoni Fondazione Ugo Bordoni Roma Roma romano@fub.it

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Using Monolingual Source-Side In-Domain Data Jen Drexler, Pamela Shapiro, Xuan Zhang SCALE

Web Information Retrieval Lecture 5 Field Search, Weighting Plan Last lecture Dictionary

Scoring, term weighting, the vector space model Giorgio Gambosi Course of Information Retrieval

NPFL103: Information Retrieval (4) Ranked retrieval, Term weighting, Vector space model Pavel

Climate: What Is It Anyway Comparing Weather and Climate Climate Regions and Biomes Comparing

A Family of Fuzzy Orthogonal Projection Models for Monolingual and Cross-lingual Hypernymy

Light-weighting Gravity Cast Parts in the Automotive Industry Jack Strong Research Manager

Grades Weighting STAAR&amp; Grad Plans Did You Know?? There are 2 types of grades MAJOR 1.

FAANG+ holdings in S&amp;P 500 &amp; MSCI EM Index S&amp;P 500 Index Weighting 20% MSCI EM Index

Course Weighting Todd Winch, Assistant Superintendent for Curriculum &amp; Instruction

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Cross-Lingual Information Retrieval Language Technology I Language Technology I Crosslingual

Monolingual Transduction Graham Neubig Site https://phontron.com/class/mtandseq2seq2019/

Papillon Lexical Database Project Monolingual Dictionaries &amp; Interlingual Links Mathieu

POSTECH at NTCIR-4: CJKE Monolingual and Korean-related Cross-Language Retrieval Experiments

X. The Sermon on the Mount Matthew 5 7, Luke 6:17 42 A. Introduction 1. The Sermon on

Where your treasure is, there will your heart be also Matthew 6:21 NIV Bible For you

Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval

The Spiritual Foundations of A New Society a Theosophical interpretation A talk at the United

2/10/20 L.O To use Roman numerals -I can think of examples where Roman numerals will be used

HIBE with Tight Multi-challenge Security Roman Langrehr ETH Zurich (Switzerland), Part of the

Romans Series Lesson #59 May 3, 2012 Dean Bible Ministries www.deanbible.org Dr. Robert L.

A Lee-Wick Extension of the Standard Model Benjamin Grinstein Indirect Searches for New Physics

Grades Weighting STAAR& Grad Plans Did You Know?? There are 2 types of grades MAJOR 1.

FAANG+ holdings in S&P 500 & MSCI EM Index S&P 500 Index Weighting 20% MSCI EM Index

Course Weighting Todd Winch, Assistant Superintendent for Curriculum & Instruction

Papillon Lexical Database Project Monolingual Dictionaries & Interlingual Links Mathieu