TimeRank
Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad
Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman
TimeRank Personalising Web-search Results Using Time And Topic - - PowerPoint PPT Presentation
TimeRank Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman Project Context What problem do we aim to address in the field of
Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad
Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman
What problem do we aim to address in the field of Information Retrieval ?
Field within Computer Science which aims to:
Current Problem in IR: “Short User Queries”
○ Related to programming ? ○ Related to coffee ?
Web search personalization as a solution:
the relevance of search results.
Problem in Web search personalization:
User Objectives:
1 2 3 4
What components make up Web search personalization?
By the end of the project, we want to:
○ Using time as well as topic
re-ranking.
Project Parts:
(Jordan Kadish)
(Tashiv Sewpersad)
(Gina Horscroft) Project Progression:
User privacy
Result censorship
A framework to support data collection and re-ranking
Why Queries were chosen:
behaviour which becomes more useful as a search session progresses [5].
Other Data Collection Sources:
○ not all internet users use bookmarks
○ useful at start of browsing session [5][9]. ○ Already being used by Google (amongst others)
○ May be difficult to track user IP’s
Engineering Goal: “Build unobtrusive browser plugin that collects, stores, and encrypts user queries.” The feature requirements:
○ After consent has been given
○ Using the methods provided by team members
Assumptions:
Scope:
Choice: Plugin vs. Toolbar?
○ Client-side storage ○ AES-256 file encryption ○ Data not transferred to server (prevent server leaks)
○ Plugin can display alert on install
○ run in the background
○ Compilation issues? Libraries convert Python/Java to Javascript (Jiphy, Transcrypt)
Unit test evaluation:
○ Are queries written to file? Encrypted? Etc
To what degree can a time and topic-based user-profile be used to predict future user searches?
What is User Modeling?
Research Question:
future user searches? Novelty:
Latent Semantic Indexing (LSI) [6]
Probabilistic Latent Semantic Indexing (PLSI) [7]
Latent Dirichlet Allocation (LDA) [8]
Preprocess
Building a User Profile:
23:00 - “java” 23:02 - “coffee houses near me” 08:00 - “java” 08:01 - “java programming guides” ... 23:00 - “java” 23:02 - “coffee house” 08:00 - “java” 08:01 - “java program guide” ...
“java” “coffee house” “program” ...
Topic Modelling 0.2 - “java” 0.1 - “coffee house” 0.6 - “program” ...
Associate Time
Step 1 - Prepare AOL query log:
1 - 23:02 - “coffee houses” 2 - 08:01 - “java programming” ...
1 - 23:02 - “coffee houses” ...
2 - 08:01 - “java programming” ...
Split 1 - “Java”, “Programming” ...
Topic Modeling
24 Hour Profile 1 Week Profile
23:02 - “coffee houses near me” ...
2 Week Profile
Build Profiles
Step 2 - Different Profile Build Profiles:
Should be looking for Java 24 Hour Profile 1 Week Profile Should be looking for Cats Test Profiles 1 - “Java”, “Programming” ...
* Sliding Window Approach Step 3- Check prediction ability:
AOL query logs are Controversial:
AOl Query Log Snippet:
A means for improving search result relevance
search results to users
○ User unlikely to specify their intentions ■ Though different users have different intentions
○ Re-rank returned results for a query based on this implicit data to improve relevance
search rankings
○ scalable
improve results, no such algorithm takes the diverging interests of users at different times into account
○ Search engines already provide results that satisfy a wide range of interests ■ Re-ranking to make these results more individualised ○ Mandl[4] showed that re-ranking is more effective than query modification ■ Adapts to users interests
Can a Web-search ranking algorithm that personalizes results on the basis of time-sensitivity return results that are more relevant to a user than an algorithm that does not?
implicit information to determine rankings Goal: determine if re-ordering search results factoring in time can improve the relevance of results
time
1. Create 10-12 “Dummy” profiles of ideal users
○ Create independance from other stages of the project ○ Use user profiles generated in step 2 as input ○ Dummy profiles as temporary stand-ins
2. Retrieve top ~20 results for a query
○ Using JSoup to extract HTML data
3. Analyse snippets of results for topics - Dictionary analysis
4. Search user history for overlap with topics 5. Re-rank list of returned documents based on each document and the user profile
Offline User Evaluation
○ TimeRank vs. Google search results prior to re-ranking
○ relevant/not relevant
NDCG metric (Normalised Discounted Cumulative Gain)
○ Measure usefulness of document based on its position in a list ○ Defines information gain based on the relevance score assigned to a document[10] ■ multi-level
○ Considers order of documents
○ Same results should be produced for same query and user profile
stored in the user profile
No. Risk Description Mitigation Strategy Probability / Severity 1 Should a team member fail to deliver a usable piece, this will impact the rest of the project as the 3 sections are not easily
the group. Make use of temporary data in the sections that rely on one another. For example, the ranking algorithm will make use of “dummy” user profiles in the absence of real ones. Low / High - Will prevent the delivery of the final working plugin. 2 Falling behind on schedule. Follow the Gantt Chart deadlines, have regular meetings. Plan for 1st and 2nd drafts for deliverables. Medium / High - Will add delays to the project's progress. 3 Progress of the project is halted due to lack
the project. Hold regular meetings with the supervisor and co-supervisor. Medium / Low - Will add a minor delay to projects progress.
Software 1. Google chrome plugin
○ Gather user search queries
2. Utility Program to build a user profile
○ Based on a collection of queries
3. Utility program that re-ranks search results
○ Based on a user profile
Answers to Research Questions
future user searches?
algorithm that does not?
User Objective at 11PM: Coffee User Objective at 8AM: Programming
[1] Teevan, Jaime, Dumais, Susan T., and Horvitz, Eric. Personalizing search via automated analysis of interests and activities. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval ( 2005), ACM, 449-456. [2] Lee, Hyun Chul and Borodin, Allan. Cluster based personalized Search. In International Workshop on Algorithms and Models for the Web-Graph ( 2009), Springer, 167-183. [3] Liu, Fang, Yu, Clement, and Meng, Weiyi. Personalized web search by mapping user queries to categories. In Proceedings of the eleventh international conference on Information and knowledge management ( 2002), ACM, 558-565.
[4] Mandl, Thomas. Artificial Intelligence for Information Retrieval. IGI Global, 2009. [5] Bennett, P. N., White, R. W., Chu, W., Dumais, S. T., Bailey, P., Borisyuk, F. and Cui, X. Modeling the impact of short-and long-term behavior on search
[6] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R. Indexing by latent semantic analysis. Journal of the American society for information science, 41, 6 (1990), 391. [7] Hofmann, T. Probabilistic latent semantic indexing. ACM, (1999), 50-57.
[8] Wei, X. and Croft, W. B. LDA-based document models for ad-hoc
[9] Vu, T., Willis, A., Kruschwitz, U. and Song, D. Personalised query suggestion for intranet search with temporal user profiling. arXiv preprint arXiv:1701.02050, ( 2017). [10] Kekäläinen, J. (2005). Binary and graded relevance in IR evaluations—Comparison of the effects on ranking of IR systems. Information Processing & Management, 41(5), 1019-1033. http://dx.doi.org/10.1016/j.ipm.2005.01.004
[11] Personalized Search for everyone. (2017). Official Google Blog. Retrieved 12 June 2017, from https://googleblog.blogspot.co.za/2009/12/personalized-search-for-everyone. html [12] Sheng, H., Goker, A. S. and He, D. Web user search pattern analysis for modelling query topic changes. Lecture Notes in Computer Science, 2109 (2001).