timerank
play

TimeRank Personalising Web-search Results Using Time And Topic - PowerPoint PPT Presentation

TimeRank Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman Project Context What problem do we aim to address in the field of


  1. TimeRank Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman

  2. Project Context What problem do we aim to address in the field of Information Retrieval ?

  3. Context: Information Retrieval (IR) Field within Computer Science which aims to: Maximise relevance of search results for a given user’s query. ● Satisfy user search intentions quickly. ● Current Problem in IR: “Short User Queries” In the context of the Internet. ● Users are unwilling to state search intentions explicitly. ● Difficult to retrieve relevant results due to ambiguities [1]. ● Example: “Java” ● Related to programming ? ○ Related to coffee ? ○

  4. Context: Web Search Personalization Web search personalization as a solution: Uses implicit user information like search query history to improve ● the relevance of search results. Often done through re-ranking or query extension. ● Shown to improve retrieval quality of IR systems [3]. ● Used by Google - same results for same query from same user. ● Problem in Web search personalization: To our knowledge, no approach can model the diverging interests ● of users at different times as observed by Mandl [4]. Example: work versus leisure interests. ●

  5. Same Result for Same Query from Same User 1 User Objectives: 2 Coffee @ 11PM ? ● Programming @ 8AM? ● 3 4

  6. Project Overview What components make up Web search personalization?

  7. Project Aim By the end of the project, we want to: Provide a more personalised overall search experience ● Using time as well as topic ○ Improve the search results returned by a user submitted query through ● re-ranking . Do so without being obtrusive ●

  8. Web Search Personalization Components

  9. Work Allocation Project Parts: Browser Plugin for User Data Collection ( Jordan Kadish ) ● User Modelling ( Tashiv Sewpersad ) ● Web-Search Ranking Algorithm ( Gina Horscroft ) ● Project Progression: One part engineering, two parts research... ● Parts are dependent on one another, ● But will be developed in parallel... ●

  10. Legal & Ethical Issues User privacy Informed user consent ● Secure storage of user information ● Result censorship Re-ranking of results may be viewed as censorship ● Allow user to disable re-ranking and view original results ●

  11. Browser Plugin A framework to support data collection and re-ranking

  12. Browser Plugin - Related Works Why Queries were chosen: Queries represent user’s general interests [5] [2]. ● Shown to improve retrieval quality when used in a user’s profile [3]. ● Especially beneficial for modelling short term (within session) user ● behaviour which becomes more useful as a search session progresses [5].

  13. Browser Plugin - Related Works Other Data Collection Sources: Bookmarks shown to be insufficient information source [2] ● not all internet users use bookmarks ○ Internet History beneficial for modelling long term user behaviour ● useful at start of browsing session [5][9]. ○ Already being used by Google (amongst others) ○ Web Server logs difficult to access publicly [12] ● May be difficult to track user IP’s ○

  14. Browser Plugin - Overview Engineering Goal: “ Build unobtrusive browser plugin that collects, stores, and encrypts user queries.” The feature requirements: Must automatically collect user information ● After consent has been given ○ Encrypt queries ● Re-rank results displayed to user ● Using the methods provided by team members ○ Assumptions: One user per pc ●

  15. Browser Plugin - Overview Scope: Currently Chrome, could be expanded to other browsers ● Chrome offers great support for developers ● Choice: Plugin vs. Toolbar? Toolbars are outdated, intrusive ● Don’t mess with the user’s normal flow of searching ●

  16. Browser Plugin - Methodology Queries need to be stored and manipulated ● Client-side storage ○ AES-256 file encryption ○ Data not transferred to server (prevent server leaks) ○ Allow users the choice to opt in ● Plugin can display alert on install ○ Unobtrusive ● run in the background ○ Combination of Javascript, Java & Python ● Compilation issues? Libraries convert Python/Java to Javascript (Jiphy, Transcrypt) ○

  17. Browser Plugin - Evaluation Unit test evaluation: Robust test cases need to be developed ● Are queries written to file? Encrypted? Etc ○

  18. User Modelling To what degree can a time and topic-based user-profile be used to predict future user searches?

  19. User Modelling - Overview What is User Modeling? Building a user profile based on query topics ● A representation of user interests ● Can be used to personalize Web search results ● Research Question: To what degree can a time and topic-based user-profile be used to predict ● future user searches? Novelty: Associating query times to the topics that represent them. ● Investigating 24 hours, 1 week and fortnight encodings. ●

  20. User Modelling - Topic Modelling Approaches Latent Semantic Indexing (LSI) [6] Assumes one topic per query. ● Probabilistic Latent Semantic Indexing (PLSI) [7] Limited to the number of topics detected during training. ● I.e. cannot set the number of topics ● Latent Dirichlet Allocation (LDA) [8] Queries related to multiple topics. ● Not limited to a set number of topics. ● I.e. can set the number of topics ●

  21. User Modelling - Methodology Building a User Profile: 1. Query Log: 2. Pre-Processed Query Log: 23:00 - “java” 23:00 - “java” 23:02 - “coffee houses near me” 23:02 - “coffee house” Preprocess 08:00 - “java” 08:00 - “java” 08:01 - “java programming guides” 08:01 - “java program guide” ... ... Topic Modelling 4. User Profile: 3. Topics: 0.2 - “java” “java” 0.1 - “coffee house” “coffee house” Associate 0.6 - “program” “program” Time ... ...

  22. User Modelling - Evaluation Step 1 - Prepare AOL query log: 1. AOL Query Log: 2. Training Set (80%): 1 - 23:02 - “coffee houses” 1 - 23:02 - “coffee houses” 2 - 08:01 - “java programming” ... ... Split 3. Testing Set (20%): 2 - 08:01 - “java programming” ... Topic Modeling 4. Testing Set Topics: 1 - “Java”, “Programming” ...

  23. User Modelling - Evaluation Step 2 - Different Profile Build Profiles: 24 Hour Profile Build Profiles 2. Training Set (80%): 1 Week Profile 23:02 - “coffee houses near me” ... 2 Week Profile

  24. User Modelling - Evaluation Step 3- Check prediction ability: Should be 24 Hour Profile looking for Java 4. Testing Set Topics: 1 - “Java”, “Programming” ... Test Profiles Should be 1 Week Profile looking for Cats * Sliding Window Approach

  25. User Modelling - Evaluation AOL query logs are Controversial: Poor anonymisation ● Now redacted by AOL ● Terms of Use: for non-commercial research use only ● AOl Query Log Snippet:

  26. Re-ranking Algorithm A means for improving search result relevance

  27. Re-ranking Algorithm - Related Works Teevan et al. [1] suggested an issue in providing more personalised ● search results to users User unlikely to specify their intentions ○ Though different users have different intentions ■ Solution - use implicit data about the user to improve results ● Re-rank returned results for a query based on this implicit data to improve relevance ○ Efficient client-side computation is able to provide improvements in ● search rankings scalable ○

  28. Re-ranking Algorithm - Related Works Mandl [4] observed that while different personalisation algorithms can ● improve results, no such algorithm takes the diverging interests of users at different times into account Why re-ranking? ● Search engines already provide results that satisfy a wide range of interests ○ Re-ranking to make these results more individualised ■ Mandl[4] showed that re-ranking is more effective than query modification ○ Adapts to users interests ■

  29. Re-ranking Algorithm - Aim Can a Web-search ranking algorithm that personalizes results on the basis of time-sensitivity return results that are more relevant to a user than an algorithm that does not? Currently, no Web-search personalisation methods factor in time as ● implicit information to determine rankings Goal: determine if re-ordering search results factoring in time can improve the relevance of results Produce a solution to re-rank search results based on a user’s habits over ● time

  30. Re-ranking Algorithm - Methodology 1. Create 10-12 “Dummy” profiles of ideal users Create independance from other stages of the project ○ Use user profiles generated in step 2 as input ○ Dummy profiles as temporary stand-ins ○ 2. Retrieve top ~20 results for a query Using JSoup to extract HTML data ○

  31. Re-ranking Algorithm - Methodology 3. Analyse snippets of results for topics - Dictionary analysis

  32. Re-ranking Algorithm - Methodology 4. Search user history for overlap with topics 5. Re-rank list of returned documents based on each document and the user profile With respect to topic and time ●

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend