CS54701: Information Retrieval CS-54701 Information Retrieval - PowerPoint PPT Presentation

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models Luo Si Department of Computer Science Purdue University

Retrieval Model: Language Model  Introduction to language model  Unigram language model  Document language model estimation  Maximum Likelihood estimation  Maximum a posterior estimation  Jelinek Mercer Smoothing  Model-based feedback

Language Models: Motivation  Vector space model for information retrieval  Documents and queries are vectors in the term space  Relevance is measure by the similarity between document vectors and query vector  Problems for vector space model  Ad-hoc term weighting schemes  Ad-hoc similarity measurement No justification of relationship between relevance and similarity  We need more principled retrieval models …

Introduction to Language Models:  Language model can be created for any language sample  A document  A collection of documents  Sentence, paragraph, chapter, query…  The size of language sample affects the quality of language model  Long documents have more accurate model  Short documents have less accurate model  Model for sentence, paragraph or query may not be reliable

Introduction to Language Models:  A document language model defines a probability distribution over indexed terms  E.g., the probability of generating a term  Sum of the probabilities is 1  A query can be seen as observed data from unknown models  Query also defines a language model (more on this later)  How might the models be used for IR?  Rank documents by Pr( | ) q d i  Rank documents by language models of and based on q d i kullback-Leibler (KL) divergence between the models (come later)

Language Model for IR: Example Generate retrieval results q Estimate the generation probability of Pr( | ) q d sport, basketball i Language Language Language Model for Model for d d Model for d 3 2 1 Estimating language model for each document d d d 2 3 1 basketball, ticket, stock, finance, sport, basketball, finance, ticket, sport finance, stock ticket, sport

Language Models Three basic problems for language models  What type of probabilistic distribution can be used to construct language models?  How to estimate the parameters of the distribution of the language models?  How to compute the likelihood of generating queries given the language modes of documents?

Multinomial/Unigram Language Models  Language model built by multinomial distribution on single terms (i.e., unigram) in the vocabulary Examples: Five words in vocabulary (sport, basketball, ticket, finance, stock) For a document , its language mode is: d i {P i (“ sport ”), P i (“ basketball ”), P i (“ ticket ”), P i (“ finance ”), P i (“ stock ”)} Formally: The language model is: {P i (w) for any word w in vocabulary V}     ( ) 1 0 ( ) 1 P w P w i k i k k

Multinomial/Unigram Language Models Multinomial Multinomial Multinomial Model for Model for 2 Model for 1 d d 3 d Estimating language model for each document d d d 2 3 1 basketball, ticket, stock, finance, sport, basketball, finance, ticket, sport finance, stock ticket, sport

Maximum Likelihood Estimation (MLE) Maximum Likelihood Estimation:  Find model parameters that make generation likelihood reach maximum: 1 I d ,...,d M*=argmax M Pr(D|M) There are K words in vocabulary, w 1 ...w K (e.g., 5) Data: one document with counts tf i (w 1 ), …, tf i (w K ), 1 d ,...,d I d i and length | | d i Model: multinomial M with parameters {p i (w k )} Likelihood: Pr( | M) d i d ,...,d I 1 M*=argmax M Pr( |M) d i

Maximum Likelihood Estimation (MLE)   K K | | d   i  tf ( w )  tf ( w )   ( | ) ( ) ( ) p d M p w i k p w i k   i i k i k ( )... ( ) tf w tf w     1 1 k k i 1 i K    ( | ) lo g ( | ) ( ) lo g ( ) l d M p d M tf w p w i i i k i k k   '     ( | ) ( ) lo g ( ) ( ( ) 1) l d M tf w p w p w i i k i k i k k k  ' ( ) ( ) l tf w t f w        i k i k 0 ( ) p w i k   ( ) ( ) p w p w i k i k ( ) c w         i k ( ) 1, ( ) | | , ( ) S in ce p w tf w d S o p w i i k i k i k | | d k k i Use Lagrange multiplier approach Set partial derivatives to zero Get maximum likelihood estimate

Maximum Likelihood Estimation (MLE) (p sp , p b , p t , p f , p st ) = (p sp , p b , p t , p f , p st ) = (p sp , p b , p t , p f , p st ) = (0.5,0.25,0.25,0,0) (0.2,0.2,0.4,0.2,0) (0,0,0,0.5,0.5) Estimating language model for each document d d d 2 3 1 basketball, ticket, stock, finance, sport, basketball, finance, ticket, sport finance, stock ticket, sport

Maximum Likelihood Estimation (MLE) Maximum Likelihood Estimation: Assign zero probabilities to unseen words in small sample  A specific example: Only two words in vocabulary (w 1 =sport, w 2 =business) like (head, tail) for a coin; A document generates sequence of two words or draw a d i coin for many times   d i ( ) ( )  tf w  tf w   P r( | ) ( ) 1 (1 ( )) 2 d M p w i p w i   i i 1 i 1 1 d ,...,d I ( ) ( ) tf w tf w   1 2 i i Only observe two words (flip the coin twice) and MLE estimators are: “business sport” P i (w 1 )=0.5 “sport sport” P i (w 1 )=1 ? “business business” P i (w 1 )=0 ?

Maximum Likelihood Estimation (MLE) A specific example: Only observe two words (flip the coin twice) and MLE estimators are: “business sport” P i (w 1 )*=0.5 “sport sport” P i (w 1 )*=1 ? “business business” P i (w 1 )*=0 ? Data sparseness problem

Solution to Sparse Data Problems  Maximum a posterior (MAP) estimation  Shrinkage  Bayesian ensemble approach

Maximum A Posterior (MAP) Estimation Maximum A Posterior Estimation: Select a model that maximizes the probability of model given  observed data M*=argmax M Pr(M|D)=argmax M Pr(D|M)Pr(M)  Pr(M): Prior belief/knowledge  Use prior Pr(M) to avoid zero probabilities A specific examples: Only two words in vocabulary (sport, business) For a document : Prior Distribution d i   d   i tf ( w ) tf ( w )    P r( | ) ( ) 1 ( ) 2 P r M d p w i p w i M i   1 2 i i ( ) ( )  tf w tf w  i 1 i 2

Maximum A Posterior (MAP) Estimation Maximum A Posterior Estimation: Introduce prior on the multinomial distribution   Use prior Pr(M) to avoid zero probabilities, most of coins are more or less unbiased  Use Dirichlet prior on p(w)      ( )     1       1 K ( | , , ) ( ) k , ( ) 1, 0 ( ) 1 D ir p p w p w p w 1 K i k i k i k i     ( ) ( ) k k 1 K Hyper-parameters Constant for p K     t x 1   ( ) x e t d x  ( x ) is gamma function 0     ( 1) ! if n n n

Maximum A Posterior (MAP) Estimation For the two word example: 2 2 a Dirichlet prior   P r( ) ( ) (1 ( )) M p w p w 1 1 P(w 1 ) 2 (1-P(w 1 ) 2 )

Maximum A Posterior (MAP) Estimation Maximum A Posterior: M*=argmax M Pr(M|D)=argmax M Pr(D|M)Pr(M)     tf ( w ) tf ( w ) 1 1   P r( | ) P r( ) ( ) 1 (1 ( )) 2 ( ) ( ) d M M p w i p w i p w 1 p w 2 i 1 1 1 1 i i i i       tf ( w ) 1 tf ( w ) 1   ( ) 1 1 (1 ( )) 2 2 p w i p w i 1 1 i i Pseudo Counts d ,...,d 1 I       ( ) 1 ( ) 1 *  tf w  tf w arg m ax ( ) i 1 1 (1 ( )) i 2 2 M p w p w i 1 i 1 ( ) p w i 1

Maximum A Posterior (MAP) Estimation A specific example: Only observe two words (flip a coin twice): “sport sport” P i (w 1 )*=1 ? P(w 1 ) 2 (1-P(w 1 ) 2 ) times

Maximum A Posterior (MAP) Estimation A specific example: Only observe two words (flip a coin twice): “sport sport” P i (w 1 )*=1 ?    ( ) 1 tf w 1 1 i  ( )* p w        1 ( ) 1 ( ) 1 tf w tf w 1 1 2 2 i i   2 3 1 4 2         2 3 1 0 3 1 6 3

MAP Estimation Unigram Language Model Maximum A Posterior Estimation: Use Dirichlet prior for multinomial distribution  How to set the parameters for Dirichlet prior 

MAP Estimation Unigram Language Model Maximum A Posterior Estimation: Use Dirichlet prior for multinomial distribution  There are K terms in the vocabulary:      : { ( ),...., ( )}, ( ) 1, 0 ( ) 1 M ultinom ial p p w p w p w p w i i 1 K i i k i k k      ( )     1       1 K ( | , , ) ( ) k , ( ) 1, 0 ( ) 1 D ir p p w p w p w 1 K i k i k i k i     ( ) ( ) k k 1 K Hyper-parameters Constant for p K

CS54701: Information Retrieval CS-54701 Information Retrieval - PowerPoint PPT Presentation

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models Luo Si Department of Computer Science Purdue University Retrieval Model: Language Model Introduction to language model Unigram language

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

CS54701: Information Retrieval CS-54701 Information Retrieval Course Review Luo Si Department

Luo Si Department of Computer Science Purdue University Retrieval Models Information Need

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Information Retrieval Introducing Information Retrieval and Web Search

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

CS54701 Federated Text Search Luo Si Department of Computer Science Purdue University Abstract

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Information Retrieval CS-7961: Topics in Information retrieval (IR) is finding material (usually

INFORMATION RETRIEVAL USING NEURAL NETWORKS VINEETH REDDY ANUGU CMSC 676 INFORMATION RETRIEVAL

Sound Event Detection in Multisource Environments Using Source Separation Toni Heittola 1 ,

CSSE 220 Software Engineering Techniques Design Principles Encapsulation Todays Agenda

Click for Video COACH LIKE A CHAMPION BUILDING A CHAMPIONSHIP PROGRAM LARRY MCKENZIE A COACH

Danielle Richards, Bessemer Park Advisory Council Agenda Learn how 2 Park Advisory Councils

Feedback Loops and Balance CSC430/HCI530 Feedback loops: a special dynamic Positive:

Understand Basketball Games 2018.6.15 Sports Videos Large quantity, high

Strategy optimization in beachvolleyball applying a two scale approach to the olympic games

Computer Graphics Seminar um olhar sobre os dados olmpicos perceptions of olympic data Jlia

CS54701: Information Retrieval CS-54701 Information Retrieval - PowerPoint PPT Presentation

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models Luo Si Department of Computer Science Purdue University Retrieval Model: Language Model Introduction to language model Unigram language

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

CS54701: Information Retrieval CS-54701 Information Retrieval Course Review Luo Si Department

Luo Si Department of Computer Science Purdue University Retrieval Models Information Need

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Information Retrieval Introducing Information Retrieval and Web Search

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

CS54701 Federated Text Search Luo Si Department of Computer Science Purdue University Abstract

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Information Retrieval CS-7961: Topics in Information retrieval (IR) is finding material (usually

INFORMATION RETRIEVAL USING NEURAL NETWORKS VINEETH REDDY ANUGU CMSC 676 INFORMATION RETRIEVAL

Sound Event Detection in Multisource Environments Using Source Separation Toni Heittola 1 ,

CSSE 220 Software Engineering Techniques Design Principles Encapsulation Todays Agenda

Click for Video COACH LIKE A CHAMPION BUILDING A CHAMPIONSHIP PROGRAM LARRY MCKENZIE A COACH

Danielle Richards, Bessemer Park Advisory Council Agenda Learn how 2 Park Advisory Councils

Feedback Loops and Balance CSC430/HCI530 Feedback loops: a special dynamic Positive:

Understand Basketball Games 2018.6.15 Sports Videos Large quantity, high

Strategy optimization in beachvolleyball applying a two scale approach to the olympic games

Computer Graphics Seminar um olhar sobre os dados olmpicos perceptions of olympic data Jlia

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models