Term Co-Occurrence VSM, session 11 CS6200: Information Retrieval - PowerPoint PPT Presentation

May 11, 2023 •206 likes •316 views

Term Co-Occurrence VSM, session 11 CS6200: Information Retrieval Slides by: Jesse Anderton Query Expansion We can add words with similar meanings to query terms, e.g. from stem classes or a thesaurus. We can also add words which commonly

Term Co-Occurrence VSM, session 11 CS6200: Information Retrieval Slides by: Jesse Anderton
Query Expansion We can add words with similar meanings to query terms, e.g. from stem classes or a thesaurus. We can also add words which commonly co-occur with query terms, on an assumption that they must be related to the same topic. Medical Subject Headings Thesaurus (NIH)
Term Association Measures Measure Formula Mutual Information n ab (MIM) n a · n b Expected Mutual Inf. � � There are many measures of term co- n ab n ab · log N · (EMIM) n a · n b occurrence. Chi-square N · n a · n b ) 2 ( n ab − 1 ( Χ 2 ) We’ll summarize them here, and then n a · n b Dice’s coefficient examine what each means and how n ab (Dice) n a + n b they differ. Measures of co-occurrence. * These formulas are partial, but rank- equivalent to the full formulas.
Dice’s Coe ffi cient Dice’s coefficient , aka the Sørensen index , is used to compare two random samples. In this case, we compare the population of documents containing terms a and b to the populations containing a and containing b . dice ( a , b ) = 2 · n ab n a + n b n ab rank = n a + n b
Pointwise Mutual Information Pointwise mutual information is a measure of correlation from � p ( a , b ) information theory. � pmi ( a , b ) := log p ( a ) p ( b ) � n ab � N = log n a n b N N = log N + log n ab n a n b = n ab rank n a n b
Expected Mutual Information Expected mutual information corrects a bias of pointwise mutual information toward low frequency terms. emim ( a , b ) ∝ P ( a , b ) · log P ( a , b ) P ( a ) P ( b ) � � = n ab n ab N log N · n a · n b � � n ab rank = n ab · log N · n a · n b
Pearson’s Chi-squared Measure Pearson’s Chi-squared test is a test of statistical significance which compares the number of term co-occurrences to the number we’d expect if the terms were independent. (This is also not the full form of this measure.) � 2 n ab − N · n a N · n b � N chi 2 ( a , b ) = N · n b N · n a N � 2 n ab − 1 � N · n a · n b rank = n a · n b
Association Measure Example Most associated terms for “tropical” Most associated terms for “fish” in a collection of TREC news stories. in the same collection.
Improving the Results Instead of counting co-occurrences in the entire document, count those that occur within a smaller window. Look for new terms associated with multiple query terms instead of just one. Using Dice with “tropical fish” gives the following list: goldfish, reptile, aquarium, coral, frog, exotic, stripe, regent, pet, wet. Most associated terms for “fish” with co-occurrences measured in a window of 5 terms.
Wrapping Up Using term association measures to select words for query expansion can help improve retrieval performance. However, it can also worsen performance if care is not taken to provide meaningful context. This approach can suffer from “topic drift.” In our next session we’ll look at relevance feedback, which finds terms for expansion based on information about which documents are relevant to the query.

Recommend

Arsenic Occurrence and Arsenic Occurrence and Innovative Technologies Innovative Technologies

Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences Arsenic Occurrence and Arsenic Occurrence and Innovative Technologies Innovative Technologies Development for Arsenic Development for Arsenic Pollution control in

396 views • 37 slides

PFAS OCCURRENCE & MONITORING GUIDANCE for California water systems Rick Zimmer May 2, 2019

PFAS OCCURRENCE & MONITORING GUIDANCE for California water systems Rick Zimmer May 2, 2019 www.EurofinsUS.com/Eaton PFAS OCCURRENCE & MONITORING PRESENTATION OUTLINE National Occurrence Monitoring Orders requirements

634 views • 17 slides

GBIF MONTHLY UPDATE August 2015 GBIF BY THE NUMBERS 566,329,309 species occurrence records

GBIF MONTHLY UPDATE August 2015 GBIF BY THE NUMBERS 566,329,309 species occurrence records 14,284 datasets 760 data-publishing institutions http://www.gbif.org | 03 AUG 2015 GBIF BY THE NUMBERS - JULY +22,969,904 species occurrence

353 views • 20 slides

Organic Compounds in Water and Wastewater Cyanotoxins Occurrence & Chemical Analysis

Print version CEE 697z Organic Compounds in Water and Wastewater Cyanotoxins Occurrence & Chemical Analysis Lecture #28 CEE 697z - Lecture #28 Microcystin Occurrence in US CEE 697z - Lecture #28 Anatoxin CEE 697z - Lecture #28

399 views • 18 slides

Introduction Against phonetic realism as the source of root co-occurrence restrictions Laryngeal

Introduction Against phonetic realism as the source of root co-occurrence restrictions Laryngeal co-occurrence restrictions are widely attested within AMP 2016, University of Southern California roots. (e.g. It & Mester 1986, MacEachern

300 views • 9 slides

Organic Compounds in Water and Wastewater Cyanotoxins Occurrence & Chemical Analysis

11/10/2014 Print version CEE 697z Organic Compounds in Water and Wastewater Cyanotoxins Occurrence & Chemical Analysis Lecture #28 CEE 697z - Lecture #28 Microcystin Occurrence in US CEE 697z - Lecture #28 1 11/10/2014 Anatoxin CEE

386 views • 9 slides

Model-based adaptive spatial sampling for occurrence map construction N. Peyrard and R. Sabbadin

Model-based adaptive spatial sampling for occurrence map construction N. Peyrard and R. Sabbadin CompSust09 - Cornell University - june 2009 p. 1 Mapping spatial processes in environmental management Mapping pest occurrence

834 views • 46 slides

GBIF MONTHLY UPDATE May 2016 GBIF BY THE NUMBERS 645,236,117 species occurrence records 28,173

GBIF MONTHLY UPDATE May 2016 GBIF BY THE NUMBERS 645,236,117 species occurrence records 28,173 datasets 809 data-publishing institutions http://www.gbif.org | 03 MAY 2016 GBIF BY THE NUMBERS: APR 2016 -7,711,947 species occurrence

547 views • 19 slides

GBIF MONTHLY UPDATE March 2016 GBIF BY THE NUMBERS 648,781,852 species occurrence records

GBIF MONTHLY UPDATE March 2016 GBIF BY THE NUMBERS 648,781,852 species occurrence records 15,516 datasets 798 data-publishing institutions http://www.gbif.org | 03 MAR 2016 GBIF BY THE NUMBERS: FEB 2016 +4,494,896 species occurrence

581 views • 20 slides

GBIF MONTHLY UPDATE October 2015 GBIF BY THE NUMBERS 577,537,741 species occurrence records

GBIF MONTHLY UPDATE October 2015 GBIF BY THE NUMBERS 577,537,741 species occurrence records 15,156 datasets 766 data-publishing institutions http://www.gbif.org | 01 OCT 2015 GBIF BY THE NUMBERS - SEPTEMBER +7,299,508 species occurrence

372 views • 20 slides

Weakly-Supervised Temporal Localization via Occurrence Count Learning Julien Schroeter

Weakly-Supervised Temporal Localization via Occurrence Count Learning Julien Schroeter schroeterj1@cardiff.ac.uk Dr Kirill Sidorov Prof David Marshall C ONTEXT Temporal Localization DNN Weakly-Supervised Temporal Localization via Occurrence

1.46k views • 128 slides

GBIF MONTHLY UPDATE February 2016 GBIF BY THE NUMBERS 644,286,956 species occurrence records

GBIF MONTHLY UPDATE February 2016 GBIF BY THE NUMBERS 644,286,956 species occurrence records 15,575 datasets 791 data-publishing institutions http://www.gbif.org | 01 FEB 2016 GBIF BY THE NUMBERS: 2015 +113,058,362 species occurrence

263 views • 22 slides

The short- -term and long term and long- -term term The short stratospheric and tropospheric

The short- -term and long term and long- -term term The short stratospheric and tropospheric stratospheric and tropospheric ozone variability available from ozone variability available from zenith sky measurements. zenith sky

418 views • 30 slides

InfoPorte by the Numbers (Slide 2) 1. Term Code From : Filled in with the current term 2. Term Code

InfoPorte by the Numbers (Slide 2) 1. Term Code From : Filled in with the current term 2. Term Code To: Also filled in with the current term These are designed so you can enter a range of terms 3. Student Enroll Status: Filled in with Yes ;

270 views • 13 slides

Presentation Outline 1. Medium Term Fiscal projections 1. The 2011/12 and Medium Term Budget

Presentation Outline 1. Medium Term Fiscal projections 1. The 2011/12 and Medium Term Budget Policy 2. Medium Term Sectoral Allocation 3. Key Policy Issues 4. Conclusion Medium Term Fiscal Projections The medium term fiscal

708 views • 25 slides

FREQUENCY OCCURRENCE IN HANDWRITING AND HAND PRINTING CHARACTERISTICS STEP 1 Development of

FREQUENCY OCCURRENCE IN HANDWRITING AND HAND PRINTING CHARACTERISTICS STEP 1 Development of Idea Need Purpose and Fit with Need (Limitations) Outline of Procedure (very general) Discussion with Peers (Refinement Process) STEP

245 views • 21 slides

S. Guatelli, J. Brown, S. Incerti, V. Ivanchenko, L. Pandola Geant4 Collaboration Workshop 2015

S. Guatelli, J. Brown, S. Incerti, V. Ivanchenko, L. Pandola Geant4 Collaboration Workshop 2015 K. Amako et al, IEEE TNS, 52(4), 910- 918, 2005. Comparison of Attenuation coefficients Stopping Power and Range of e - , p and

529 views • 11 slides

Building Science Vocabulary: Seeds of Science Roots of Reading Goal Review our model for

Building Science Vocabulary: Seeds of Science Roots of Reading Goal Review our model for vocabulary development Share some vocabulary instructional routines Approach to ELL vocabulary development Vocabulary Research Low

303 views • 28 slides

Tracing Compilation by Abstract Interpretation S. Dissegna, F. Logozzo, F. Ranzato Thophile

Just-In-Time compilation Study setup Abstract interpretation Correctness proof Tracing Compilation by Abstract Interpretation S. Dissegna, F. Logozzo, F. Ranzato Thophile Bastian March 7, 2018 Slides: https://tiny.tobast.fr/m2-absint-slides

445 views • 16 slides

Resonance in Iron neutron rich exotic isotopes R.Avigo 1,2 A.Bracco 1,2 , O.Wieland 1 , F.Camera

Investigation of Pygmy Dipole Resonance in Iron neutron rich exotic isotopes R.Avigo 1,2 A.Bracco 1,2 , O.Wieland 1 , F.Camera 1,2 on behalf of the AGATA collaborations 1 INFN sezione di Milano 2 Universit degli Studi di Milano Outlines

408 views • 17 slides

Computational Social Choice: Spring 2019 Ulle Endriss Institute for Logic, Language and

Automated Reasoning for SCT COMSOC 2019 Computational Social Choice: Spring 2019 Ulle Endriss Institute for Logic, Language and Computation University of Amsterdam Ulle Endriss 1 Automated Reasoning for SCT COMSOC 2019 Plan for Today

323 views • 28 slides

Birth of a De Facto Standard Message Passing Interface Al Geist ORNL Celebrating 25 years of

Birth of a De Facto Standard Message Passing Interface Al Geist ORNL Celebrating 25 years of MPI September 25, 2017 ANL ORNL is managed by UT-Battelle for the US Department of Energy Birth of a De Facto Standard Or How I Stopped Worrying

125 views • 12 slides

Learning from a Learner Alexis Jacq (1,2), Matthieu Geist (1), Ana Paiva (2), Olivier Pietquin (1)

Learning from a Learner Alexis Jacq (1,2), Matthieu Geist (1), Ana Paiva (2), Olivier Pietquin (1) 1 Google Research, Brain team 2 Instituto Superior Tecnico, University of Lisbon Goal: You want to learn an optimal behaviour by watching others

378 views • 12 slides

Are Killer Apps Killing Exascale? Al Geist Corporate Fellow Oak Ridge National Lab CCDSC 2016

Are Killer Apps Killing Exascale? Al Geist Corporate Fellow Oak Ridge National Lab CCDSC 2016 Lyon France October 4, 2016 ORNL is managed by UT-Battelle for the US Department of Energy This is HUGE! This is HUGE! I love this U.S.

325 views • 10 slides

Term Co-Occurrence VSM, session 11 CS6200: Information Retrieval - PowerPoint PPT Presentation

Term Co-Occurrence VSM, session 11 CS6200: Information Retrieval Slides by: Jesse Anderton Query Expansion We can add words with similar meanings to query terms, e.g. from stem classes or a thesaurus. We can also add words which commonly

Arsenic Occurrence and Arsenic Occurrence and Innovative Technologies Innovative Technologies

PFAS OCCURRENCE &amp; MONITORING GUIDANCE for California water systems Rick Zimmer May 2, 2019

GBIF MONTHLY UPDATE August 2015 GBIF BY THE NUMBERS 566,329,309 species occurrence records

Organic Compounds in Water and Wastewater Cyanotoxins Occurrence &amp; Chemical Analysis

Introduction Against phonetic realism as the source of root co-occurrence restrictions Laryngeal

Organic Compounds in Water and Wastewater Cyanotoxins Occurrence &amp; Chemical Analysis

Model-based adaptive spatial sampling for occurrence map construction N. Peyrard and R. Sabbadin

GBIF MONTHLY UPDATE May 2016 GBIF BY THE NUMBERS 645,236,117 species occurrence records 28,173

GBIF MONTHLY UPDATE March 2016 GBIF BY THE NUMBERS 648,781,852 species occurrence records

GBIF MONTHLY UPDATE October 2015 GBIF BY THE NUMBERS 577,537,741 species occurrence records

Weakly-Supervised Temporal Localization via Occurrence Count Learning Julien Schroeter

GBIF MONTHLY UPDATE February 2016 GBIF BY THE NUMBERS 644,286,956 species occurrence records

The short- -term and long term and long- -term term The short stratospheric and tropospheric

InfoPorte by the Numbers (Slide 2) 1. Term Code From : Filled in with the current term 2. Term Code

Presentation Outline 1. Medium Term Fiscal projections 1. The 2011/12 and Medium Term Budget

FREQUENCY OCCURRENCE IN HANDWRITING AND HAND PRINTING CHARACTERISTICS STEP 1 Development of

S. Guatelli, J. Brown, S. Incerti, V. Ivanchenko, L. Pandola Geant4 Collaboration Workshop 2015

Building Science Vocabulary: Seeds of Science Roots of Reading Goal Review our model for

Tracing Compilation by Abstract Interpretation S. Dissegna, F. Logozzo, F. Ranzato Thophile

Resonance in Iron neutron rich exotic isotopes R.Avigo 1,2 A.Bracco 1,2 , O.Wieland 1 , F.Camera

Computational Social Choice: Spring 2019 Ulle Endriss Institute for Logic, Language and

Birth of a De Facto Standard Message Passing Interface Al Geist ORNL Celebrating 25 years of

Learning from a Learner Alexis Jacq (1,2), Matthieu Geist (1), Ana Paiva (2), Olivier Pietquin (1)

Are Killer Apps Killing Exascale? Al Geist Corporate Fellow Oak Ridge National Lab CCDSC 2016

PFAS OCCURRENCE & MONITORING GUIDANCE for California water systems Rick Zimmer May 2, 2019

Organic Compounds in Water and Wastewater Cyanotoxins Occurrence & Chemical Analysis

Organic Compounds in Water and Wastewater Cyanotoxins Occurrence & Chemical Analysis