CS6200: Information Retrieval
Slides by: Jesse Anderton
Course Content
IR, session 8
Course Content IR, session 8 CS6200: Information Retrieval Slides - - PowerPoint PPT Presentation
Course Content IR, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton Big Questions in IR Here are some questions well discuss: Whats the most effective way to perform semantic matching? What else can improve a
CS6200: Information Retrieval
Slides by: Jesse Anderton
IR, session 8
Here are some questions we’ll discuss:
semantic matching?
besides semantic matching?
malicious web content (e.g. spam)?
so queries require fewer resources?
search?
The next module covers Vector Space Models in more depth. It addresses three big questions:
represent the query?
we use to improve on TF?
should we use instead of the dot product?
This module does probabilistic semantic matching using NLP-style language
word usage
the likelihood that the query and document are on the same subject
don’t have enough data to train it (e.g. for short documents, or queries)
Here we discuss improving a ranking by adding extra information to the semantic matching scores:
document
Machine Learning
relevance information to produce a final ranking
This module discusses ways to get a stronger signal of the document’s topic:
document’s structure
nouns) mentioned in the document
topics
together
Here we move to the mechanics of search, and discuss how to find documents on the Internet:
crawl (because you can’t crawl everything)
documents you’ve already crawled
pitfalls of crawling the web
This module discusses the inverted index in depth:
raw documents
corpus level content in your index
search time
Here, we cover ways to improve the user interface and use recorded user interaction to improve search quality:
documents, so they can decide what to click on
documents are relevant
them to customize search
How can you tell whether your search engine is good, whether it’s improving, and whether it can get better?
interaction to compare rankings
to compare rankings
approach for your specific task
We explore interesting query types that move beyond keyword search:
natural language
available information in the collection
information on the Internet, and performing logical inference on its contents
This module discusses searching for non-textual content:
music
Many users on the web seek to exploit IR systems to make money at the expense of search quality. This module covers:
malicious web users
the web
farms
Search engines are expensive. How can we make money with them without sacrificing search quality? This module covers:
queries
pages
experience by managing ad quality
This module discusses modern approaches of Machine Learning to IR ranking:
Learning problem
by Learning to Rank algorithms
Our final module covers advanced and experimental approaches to semantic matching:
semantic matching problem
into a latent space
Machine Learning problem (with applications far beyond ranking!)