Proximity Language Model A Language Model beyond Bag of Words - - PowerPoint PPT Presentation

proximity language model
SMART_READER_LITE
LIVE PREVIEW

Proximity Language Model A Language Model beyond Bag of Words - - PowerPoint PPT Presentation

Proximity Language Model A Language Model beyond Bag of Words through Proximity Jinglei Zhao 1 1,2 Yeogirl Yun 1 iZENEsoft, Inc. 2 Wisenut, Inc. 2 Wi t I Outline Introduction The proposed model Proximity Language Model


slide-1
SLIDE 1

Proximity Language Model

A Language Model beyond Bag of Words through Proximity

Yeogirl Yun Jinglei Zhao1

1,2

iZENEsoft, Inc. Wi t I

1 2 Wisenut, Inc. 2

slide-2
SLIDE 2

Outline

Introduction

The proposed model

Proximity Language Model Modeling Proximate Centrality of Terms

Experiment and Result

slide-3
SLIDE 3

Introduction

Introduction The proposed model Experiment and Result

Background

Probabilistic models are prevalent in IR Probabilistic models are prevalent in IR.

Documents are represented as “bag of words” (BOW). Statistics usually exploited under BOW: y p Term frequency,inverse document frequency Document length, etc. Merits Simplicity in modeling. Effectiveness in parameter estimation Effectiveness in parameter estimation.

Model more under the BOW assumption.

BOW are criticized for not capturing the relatedness between terms. Could we model term relatedness while retain the simplicity of probabilistic modeling under BOW?

slide-4
SLIDE 4

Introduction

Introduction The proposed model Experiment and Result

Background

Proximity information.

Represents the closeness or compactness of the query terms appearing in a document. Underlying intuition of using proximity in ranking: y g g p y g The more compact the terms, the more likely that they are topically related. The closer the query terms appear the more possible the The closer the query terms appear, the more possible the document is relevant. It can be seen as a kind of indirect measure of term relatedness or dependence.

slide-5
SLIDE 5

Introduction

Introduction The proposed model Experiment and Result

Objective

Integrate proximity information into Unigram language modeling. g

Language modeling has become a very promising direction in IR. Solid theoretical background. g Empirical good performance. This paper’s focus: Develop a systematic way to integrate the term proximity Develop a systematic way to integrate the term proximity information into the unigram language modeling.

slide-6
SLIDE 6

Introduction

Introduction The proposed model Experiment and Result

Dependency Modeling

Related Work

Dependency Modeling

General language model, dependency language model,etc. Shortcoming: The parameter estimation become much more difficult to g p compute and sensitive to data sparse and noise.

Ph I d i Phrase Indexing

Incorporate bigger unit than word such as phrase or loose phrase in text representation text representation. Shortcoming: The improvement of using phrases is not consistent.

Previous Proximity Modeling

Span-based, pair-based. Shortcoming: Combining with relevant score at document level Shortcoming: Combining with relevant score at document-level, intuitive, without theoretical ground.

slide-7
SLIDE 7

Introduction

Introduction The proposed model Experiment and Result

Integrate Proximity with Unigram Language Model

Our Approach

Integrate Proximity with Unigram Language Model

View query term’s proximate centrality as Dirichlet hyper-parameters. Combines the score at the term level Combines the score at the term level. Boost a term’s score contribution when the term is at a central place in the proximity structure.

Merits

A uniform ranking formula. Mathematically grounded. Performs better empirically.

slide-8
SLIDE 8

Proximity Language Model

Introduction The proposed model Experiment and Result

Represent query and document as vectors of term counts

Unigram Language Model

Represent query and document as vectors of term counts Q d d t t d b lti i l di t ib ti Query and document are generated by multinomial distribution The relevance of to is measured by the probability of generating by the language model estimated from l d l d q q

slide-9
SLIDE 9

Proximity Language Model

Introduction The proposed model Experiment and Result

Our belief and expectation

Integration with Proximity

Our belief and expectation

should d that believe we , d in than d in proximate more appears q

  • f

terms query the while equal being

  • thers

all supposing , d and d Given

a b a b a

. d than query the to relevant more be

b a b a

estimated model language the represent and if words,

  • ther

In

b a

θ θ

∧ ∧

is q that y probabilit the that believe we ly, respective d and d from

b a

. than higher be should from generated

b a

θ θ

∧ ∧

Express our expectation

s term' each to al proportion be should y probabilit emission s Term'

i , l

θ i j t i b i t t b th E .

  • n

weight the as ) (w Prox View terms. query

  • ther

to respect with ) (w Prox score y centrailit proximity

i , l i d i d ,

1 l

θ θ .

  • n

prior conjugate a using by points two above the Express

l

θ

slide-10
SLIDE 10

Proximity Language Model

Introduction The proposed model Experiment and Result

Dirichlet prior on

l θ

Integration with Proximity

Dirichlet prior on

l θ

The posterior estimation of

l θ

Th i it i t t d ti ti f th d i i The proximity integrated estimation of the word emission probability

slide-11
SLIDE 11

Introduction The proposed model Experiment and Result

Proximity Language Model

Interpretation on proximity document model

Integration with Proximity

Interpretation on proximity document model

Transform proximity information to word count information. Boost a term’s likelihood when it is proximate to other terms Boost a term s likelihood when it is proximate to other terms. From the original bag of words to a pseudo “bag of words”. More generally, a way of model term relatedness under BOW?

Relation with smoothing. g

The proximity factor mainly functions to adjust the parameters for seen matching terms with respect to a query in a document. Smoothing is motivated to weight the unseen words in the document.

slide-12
SLIDE 12

Introduction The proposed model Experiment and Result

Proximity Language Model

Integration with Proximity

Further smoothing with collection language model The ranking formula under KL divergence framework

slide-13
SLIDE 13

Introduction The proposed model Experiment and Result

Modeling Proximate Centrality of Terms Term’s Proximate centrality

Term Proximity Measure

Term s Proximate centrality

  • f

score constant a have to assumed are they terms, query

  • non

For ). (w Prox : proximity term

  • f

estimation the is PLM in notion key A

i dl

t ' th t l t th fl t th t proximity a to according computed be should it term, query a For zero.

  • f

score constant a have to assumed are they terms, query non For

Measuring Proximity via Pair Distance

s. term' query

  • ther

to closeness terms the reflects that measure

Measuring Proximity via Pair Distance

Represent a term’s proximity by measuring its distance to other query terms in the document terms in the document. How to define a term’s distance to other terms in a document? how to map term distance to the term’s proximate centrality score?

slide-14
SLIDE 14

Introduction The proposed model Experiment and Result

Modeling Proximate Centrality of Terms Pairwise term distance

Term Proximity Measure

Pairwise term distance

Represented as the distance between the closest occurring positions

  • f the two terms in the document.

Pairwise proximity

slide-15
SLIDE 15

Introduction The proposed model Experiment and Result

Modeling Proximate Centrality of Terms

Computation of Term’s Proximate Centrality

Term Proximity based on Minimum Distance Term Proximity based on Average Distance Term Proximity Summed over Pair Proximity

slide-16
SLIDE 16

Introduction The proposed model Experiment and Result

Modeling Proximate Centrality of Terms

An example

P i i d b diff (f 1 5 )

dist

Proximity computed by different measures (f = 1.5 )

−dist

slide-17
SLIDE 17

Introduction The proposed model Experiment and Result

Experiment and Result

Experimental Setting

Data Set Experimental platform

Lemur toolkit. A naive tokenizer. A very small stopword list.

slide-18
SLIDE 18

Introduction The proposed model Experiment and Result

Experiment and Result

Experimental Setting

Baselines

Basic KL divergence language model (LM) Tao’s document-level linear score combination (LLM).

slide-19
SLIDE 19

Introduction The proposed model Experiment and Result

Experiment and Result

LM

Parameter Setting

LM

The prior collection sample size μ is set to 2000 across all the experiments which is also used in LLM and PLM.

LLM

Parameter is optimized by searching : 0.1, 0.2, ..., 1.0.

PLM

Proximity argument λ:controls the proportional weight of prior proximity factor relative to the observed word count information. Exponential weight para:controls the proportional ratio of proximity score between different query terms. q y Optimization space: para : 1.1, 1.2, ..., 2.0, λ : 0.1, 1, 2, 3, ..., 10.

slide-20
SLIDE 20

Introduction The proposed model Experiment and Result

Experiment and Result

PLM’s parameter Sensitivity using P_MinDist.

slide-21
SLIDE 21

Introduction The proposed model Experiment and Result

Experiment and Result

Comparison of Best Performance

slide-22
SLIDE 22

Introduction The proposed model Experiment and Result

Experiment and Result

Main Observation

The observations e obse a o s

PLM performs empirically better than LM and LLM. LLM fails on Ohsumed collection (more verbose in queries). PLM performs very well on verbose queries. For the three proposed term proximity measures used in PLM, P SumProx and P MinDist performs better than P AveDist. P_SumProx and P_MinDist performs better than P_AveDist.

slide-23
SLIDE 23

Introduction The proposed model Experiment and Result

Experiment and Result

The Influence of Stop Word

Considering stop word in query

A good ranking function should also perform well when stop words are g g p p considered. Stop word usually has many occurrences, resulting in a great chance to be proximate with other words in the document be proximate with other words in the document. Make the proximity mechanism at risk to loose its effect.

Test setting

All the queries from TOPIC251-300 that contain at least one word in the used stop word list.

slide-24
SLIDE 24

Introduction The proposed model Experiment and Result

Experiment and Result

Ob ti

The Influence of Stop Word

Observations

LLM fails when stop word is considered considered. PLM can still improve on the basic language model. P_SumProx is the best choice of the three term proximity measures.

Underlying Reason

Stop word affect LLM globally. Stop word affect PLM locally.

slide-25
SLIDE 25

Conclusions

Main Contribution Main Contribution

Propose a novel way to integrate proximity factor into the unigram language modeling. g g g The model views query terms’ proximate centrality as Dirichlet hyper-parameters. A term’s score contribution is boosted when it is at a high A term s score contribution is boosted when it is at a high proximate area among query terms. This integration method is mathematical grounded and shows empirical better performance. Besides simple keyword query, this model also performs well in verbose query and stop word containing query. query and stop word containing query. Illustrate a way to model more beyond BOW under BOW.

slide-26
SLIDE 26

Conclusions

Future Work Future Work

Develop a more efficient way to set the parameters of the PLM model instead of using exhaustive search. g Study how to normalize the term proximity centrality to a given scale or even to probability. See the effect on parameter tuning See the effect on parameter tuning. See the effect on ranking result. Study how to combine the proximity with other document information such as prior document strength to further improve the effectiveness of language modeling.

slide-27
SLIDE 27

Thank you! Any questions?