text mining in
play

Text Mining in Search Engines By: DJ Ambler With special thanks to - PowerPoint PPT Presentation

Text Mining in Search Engines By: DJ Ambler With special thanks to the Internet Overview What is text mining? How is it used in search engines? Text Mining Definition A way to extract meaning from text Structuring, deriving


  1. Text Mining in Search Engines By: DJ Ambler With special thanks to the Internet

  2. Overview ● What is text mining? ● How is it used in search engines?

  3. Text Mining Definition ● A way to extract meaning from text ● Structuring, deriving patterns, then evaluating ● “High quality” in text mining

  4. Text Mining Tasks ● Text categorization ● Text clustering ● Concept/entity extraction ● Production of granular taxonomies ● Sentiment analysis ● Document summarization ● Entity relation modeling

  5. Parts of a Search Engine ● Crawler ● Indexer ● Ranker

  6. Crawler (Spider) Issues in crawling: 1. What to crawl? 2. How much to crawl? 3. How often to crawl?

  7. Indexer ● Stop words ● Stemming ● Issues

  8. Ranker ● Receives query ● Searches index ● Ranks the pages based on complex algorithms

  9. Ranking Criteria ● Number of matching query words in the page ● Proximity of matching words to one another ● Location of terms within the page ● Location of terms within tags e.g. <title>, <h1>, link text, body text, etc... ● Frequency of terms on the page and in general ● How “fresh” is the page

  10. Sources Cong, G. (n.d.). Introduction to Text Mining and Web Search. ● Retrieved November 3, 2017. Joshi, H. (n.d.). Search Engines - Text Mining in Action. Retrieved ● November 03, 2017, from https://www.scribd.com/document/176948623/Search- Engines-Text-Mining-in-Action

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend