Background Smoothing LM, session 8 CS6200: Information Retrieval - PowerPoint PPT Presentation

Background Smoothing LM, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Limits of Uniform Smoothing Uniform smoothing assigns the same probability to all unseen words, which isn’t realistic. This is easiest to see for n-gram models: P ( house | the , white ) > P ( effortless | the , white ) We strongly believe that “house” is more likely to follow “the white” than “effortless” is, even if neither trigram appears in our training data. Our bigram counts should help: “white house” probably appears more often than “white effortless.” We can use bigram probabilities as a background distribution to help smooth our trigram probabilities.

Jelinek-Mercer Smoothing One way to combine foreground and background distributions is to take their linear combination. This is the simplest form of Jelinek-Mercer Smoothing. ˆ p ( e ) = λp fg ( e ) + ( 1 − λ ) p bg ( e ) , 0 < λ < 1 For instance, you can smooth n-grams with (n-1)-gram probabilities. ˆ p ( w n | w 1 , . . . , w n − 1 ) = λp ( w n | w 1 , . . . , w n − 1 ) + ( 1 − λ ) p ( w n | w 2 , . . . , w n − 1 ) You can also smooth document estimates with corpus-wide estimates. p ( w | d ) = λ tf w , d cf w ˆ | d | + ( 1 − λ ) � w cf w

Relationship to Laplace Smoothing | d | Pick λ = | d | + | V | Most smoothing techniques amount to p ( w | d ) = λ tf w , d | d | + ( 1 − λ ) 1 finding a particular value for λ in ˆ | V | Jelinek-Mercer smoothing. � tf w , d � 1 � � | d | | V | = | d | + For instance, add-one smoothing is | d | + | V | | d | + | V | | V | Jelinek-Mercer smoothing with a tf w , d 1 = | d | + | V | + uniform background distribution and a | d | + | V | particular value of λ . = tf w , d + 1 | d | + | V |

Relationship to TF-IDF � � λ tf w , d | d | + ( 1 − λ ) df w TF-IDF is also closely related to � log P ( q | d ) = log | c | w ∈ q Jelinek-Mercer smoothing. � � λ tf w , d | d | + ( 1 − λ ) df w log( 1 − λ ) df w � � = log + | c | | c | If you smooth the query likelihood w ∈ q : tf w , d > 0 w ∈ q : tf w , d = 0 λ tf w , d | d | + ( 1 − λ ) df w � � model with a corpus-wide background log( 1 − λ ) df w | c | � � = log + ( 1 − λ ) df w | c | probability, the resulting scoring | c | w ∈ q w ∈ q : tf w , d > 0 λ tf w , d function is proportional to TF and � � | d | rank � = log + 1 inversely proportional to DF. ( 1 − λ ) df w | c | w ∈ q : tf w , d > 0

Dirichlet Smoothing Dirichlet Smoothing is the same as Jelinek-Mercer smoothing, picking λ based on document length and a parameter μ – an estimate of the cf w tf w , d + μ � w cf w average doc length. ˆ p ( w | d ) = | d | + μ μ λ = 1 − | d | + μ cf w tf w , d + μ � w cf w � log p ( q | d ) = log The scoring function to the right is the | d | + μ w ∈ q Bayesian posterior using a Dirichlet prior with parameters: � � cf w 1 cf w n μ , . . . , μ � � w cf w w cf w

Example: Dirichlet Smoothing Query: “president lincoln” tf president,d 15 cf w tf w , d + μ � w cf w � log p ( q | d ) = log | d | + μ cf president 160,000 w ∈ q = log 15 + 2 , 000 × ( 160 , 000 / 10 9 ) tf lincoln,d 25 1 , 800 + 2 , 000 + log 25 + 2 , 000 × ( 2 , 400 / 10 9 ) cf lincoln 2,400 1 , 800 + 2 , 000 = log( 15 . 32 / 3 , 800 ) + log( 25 . 005 / 3 , 800 ) |d| 1,800 = − 5 . 51 + − 5 . 02 = − 10 . 53 Σ w cf w 10 9 μ 2,000

E ff ect of Dirichlet Smoothing Dirichlet Smoothing is a good choice for many IR tasks. ML Score Smoothed tf president tf lincoln • As with all smoothing techniques, it never Score assigns zero probability to a term. 15 25 -3.937 -10.53 • It is a Bayesian posterior which considers 15 1 -5.334 -13.75 how the document differs from the corpus. 15 0 N/A -19.05 • It normalizes by document length, so 1 25 -5.113 -12.99 estimates from short documents and long documents are comparable. 0 25 N/A -14.40 • It runs quickly, compared to many more exotic smoothing techniques.

Wrapping Up Much of this information about smoothing is discussed in more detail, and with empirical analysis, by the following paper: Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22, 2 (April 2004), 179-214. There are many other smoothing techniques. We have focused on those most often used in document scoring for IR. Next, we’ll look at scoring documents using the query’s language model, instead of the document’s.

Background Smoothing LM, session 8 CS6200: Information Retrieval - PowerPoint PPT Presentation

Background Smoothing LM, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton Limits of Uniform Smoothing Uniform smoothing assigns the same probability to all unseen words, which isnt realistic. This is easiest to see for

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

THE COMPARISON OF INCOME THE COMPARISON OF INCOME SMOOTHING AND MARKET SMOOTHING AND MARKET

Testing for Poverty Traps: Asset Smoothing versus Consumption Smoothing in Burkina Faso (with

8.2 Surface Smoothing Hao Li http://cs621.hao-li.com 1 Mesh Optimization Smoothing Low

Image Smoothing ! Chicken-and-egg dilemma! " ! Edge preserving image smoothing !

Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based

Kernel Smoothing Methods (Part 1) Henry Tan Georgetown University April 13, 2015 Georgetown

When does label smoothing help? Rafael Mller, Simon Kornblith, Geofgrey Hinton Label smoothing

8.2 Surface Smoothing Weikai Chen http://cs621.hao-li.com 1 Mesh Optimization Smoothing

Smoothing In image processing literature, the weighting averaging operation is referred to as

Event-consistent smoothing in generalized Introduction Conventional CRS stack high-density

Rotating Half Smoothing Filters, Image Rotating Half Smoothing Filters, Image Segmentation and

Computational Statistics Lectures 10-13: Smoothing and Nonparametric Inference Dr Jennifer

Implementation of a Fluctuation Smoothing Production Control Policy in IBMs 200mm Wafer Fab

Smoothing investment cycles in the water sector 13 July 2012 Mark Worsfold, Chief Engineer, Ofwat

Diffeomorphisms of discs Oscar Randal-Williams Smoothing theory M a topological d -manifold,

Collection Characters Documents Avg. doc. len. gzip-compr. xz-compr. 8,945,231,276 3,903,703

Application of the transport models for inverse modeling of the greenhouse gas fluxes greenhouse

Preparing for the Launch of Peppol e-Invoicing in Australia & NZ Technology Update Simon

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall www.cse.buffalo.

Homotopy Analysis for Tensor PCA Yuan Deng Duke University Joint work with Anima Anandkumar,

Seamless Modeling from Creek to Ocean on Unstructured Grids Joseph Zhang Virginia Institute of

High frequency waves and the maximal smoothing effect for nonlinear scalar conservation laws

Language Modeling Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language

Sambuz

Useful Links

Newsletter

Mail Us

Background Smoothing LM, session 8 CS6200: Information Retrieval - PowerPoint PPT Presentation

Background Smoothing LM, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton Limits of Uniform Smoothing Uniform smoothing assigns the same probability to all unseen words, which isnt realistic. This is easiest to see for

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

THE COMPARISON OF INCOME THE COMPARISON OF INCOME SMOOTHING AND MARKET SMOOTHING AND MARKET

Testing for Poverty Traps: Asset Smoothing versus Consumption Smoothing in Burkina Faso (with

8.2 Surface Smoothing Hao Li http://cs621.hao-li.com 1 Mesh Optimization Smoothing Low

Image Smoothing ! Chicken-and-egg dilemma! &quot; ! Edge preserving image smoothing !

Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based

Kernel Smoothing Methods (Part 1) Henry Tan Georgetown University April 13, 2015 Georgetown

When does label smoothing help? Rafael Mller, Simon Kornblith, Geofgrey Hinton Label smoothing

8.2 Surface Smoothing Weikai Chen http://cs621.hao-li.com 1 Mesh Optimization Smoothing

Smoothing In image processing literature, the weighting averaging operation is referred to as

Event-consistent smoothing in generalized Introduction Conventional CRS stack high-density

Rotating Half Smoothing Filters, Image Rotating Half Smoothing Filters, Image Segmentation and

Computational Statistics Lectures 10-13: Smoothing and Nonparametric Inference Dr Jennifer

Implementation of a Fluctuation Smoothing Production Control Policy in IBMs 200mm Wafer Fab

Smoothing investment cycles in the water sector 13 July 2012 Mark Worsfold, Chief Engineer, Ofwat

Diffeomorphisms of discs Oscar Randal-Williams Smoothing theory M a topological d -manifold,

Collection Characters Documents Avg. doc. len. gzip-compr. xz-compr. 8,945,231,276 3,903,703

Application of the transport models for inverse modeling of the greenhouse gas fluxes greenhouse

Preparing for the Launch of Peppol e-Invoicing in Australia &amp; NZ Technology Update Simon

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall www.cse.buffalo.

Homotopy Analysis for Tensor PCA Yuan Deng Duke University Joint work with Anima Anandkumar,

Seamless Modeling from Creek to Ocean on Unstructured Grids Joseph Zhang Virginia Institute of

High frequency waves and the maximal smoothing effect for nonlinear scalar conservation laws

Language Modeling Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language

Sambuz

Useful Links

Newsletter

Mail Us

Image Smoothing ! Chicken-and-egg dilemma! " ! Edge preserving image smoothing !

Preparing for the Launch of Peppol e-Invoicing in Australia & NZ Technology Update Simon