TimeRank Personalising Web-search Results Using Time And Topic - - PowerPoint PPT Presentation

timerank
SMART_READER_LITE
LIVE PREVIEW

TimeRank Personalising Web-search Results Using Time And Topic - - PowerPoint PPT Presentation

TimeRank Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman Project Context What problem do we aim to address in the field of


slide-1
SLIDE 1

TimeRank

Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad

Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman

slide-2
SLIDE 2

Project Context

What problem do we aim to address in the field of Information Retrieval ?

slide-3
SLIDE 3

Field within Computer Science which aims to:

  • Maximise relevance of search results for a given user’s query.
  • Satisfy user search intentions quickly.

Current Problem in IR: “Short User Queries”

  • In the context of the Internet.
  • Users are unwilling to state search intentions explicitly.
  • Difficult to retrieve relevant results due to ambiguities [1].
  • Example: “Java”

○ Related to programming ? ○ Related to coffee ?

Context: Information Retrieval (IR)

slide-4
SLIDE 4

Web search personalization as a solution:

  • Uses implicit user information like search query history to improve

the relevance of search results.

  • Often done through re-ranking or query extension.
  • Shown to improve retrieval quality of IR systems [3].
  • Used by Google - same results for same query from same user.

Problem in Web search personalization:

  • To our knowledge, no approach can model the diverging interests
  • f users at different times as observed by Mandl [4].
  • Example: work versus leisure interests.

Context: Web Search Personalization

slide-5
SLIDE 5

Same Result for Same Query from Same User

User Objectives:

  • Coffee @ 11PM ?
  • Programming @ 8AM?

1 2 3 4

slide-6
SLIDE 6

Project Overview

What components make up Web search personalization?

slide-7
SLIDE 7

Project Aim

By the end of the project, we want to:

  • Provide a more personalised overall search experience

○ Using time as well as topic

  • Improve the search results returned by a user submitted query through

re-ranking.

  • Do so without being obtrusive
slide-8
SLIDE 8

Web Search Personalization Components

slide-9
SLIDE 9

Work Allocation

Project Parts:

  • Browser Plugin for User Data Collection

(Jordan Kadish)

  • User Modelling

(Tashiv Sewpersad)

  • Web-Search Ranking Algorithm

(Gina Horscroft) Project Progression:

  • One part engineering, two parts research...
  • Parts are dependent on one another,
  • But will be developed in parallel...
slide-10
SLIDE 10

Legal & Ethical Issues

User privacy

  • Informed user consent
  • Secure storage of user information

Result censorship

  • Re-ranking of results may be viewed as censorship
  • Allow user to disable re-ranking and view original results
slide-11
SLIDE 11

A framework to support data collection and re-ranking

Browser Plugin

slide-12
SLIDE 12

Why Queries were chosen:

  • Queries represent user’s general interests [5] [2].
  • Shown to improve retrieval quality when used in a user’s profile [3].
  • Especially beneficial for modelling short term (within session) user

behaviour which becomes more useful as a search session progresses [5].

Browser Plugin - Related Works

slide-13
SLIDE 13

Other Data Collection Sources:

  • Bookmarks shown to be insufficient information source [2]

○ not all internet users use bookmarks

  • Internet History beneficial for modelling long term user behaviour

○ useful at start of browsing session [5][9]. ○ Already being used by Google (amongst others)

  • Web Server logs difficult to access publicly [12]

○ May be difficult to track user IP’s

Browser Plugin - Related Works

slide-14
SLIDE 14

Browser Plugin - Overview

Engineering Goal: “Build unobtrusive browser plugin that collects, stores, and encrypts user queries.” The feature requirements:

  • Must automatically collect user information

○ After consent has been given

  • Encrypt queries
  • Re-rank results displayed to user

○ Using the methods provided by team members

Assumptions:

  • One user per pc
slide-15
SLIDE 15

Browser Plugin - Overview

Scope:

  • Currently Chrome, could be expanded to other browsers
  • Chrome offers great support for developers

Choice: Plugin vs. Toolbar?

  • Toolbars are outdated, intrusive
  • Don’t mess with the user’s normal flow of searching
slide-16
SLIDE 16

Browser Plugin - Methodology

  • Queries need to be stored and manipulated

○ Client-side storage ○ AES-256 file encryption ○ Data not transferred to server (prevent server leaks)

  • Allow users the choice to opt in

○ Plugin can display alert on install

  • Unobtrusive

○ run in the background

  • Combination of Javascript, Java & Python

○ Compilation issues? Libraries convert Python/Java to Javascript (Jiphy, Transcrypt)

slide-17
SLIDE 17

Browser Plugin - Evaluation

Unit test evaluation:

  • Robust test cases need to be developed

○ Are queries written to file? Encrypted? Etc

slide-18
SLIDE 18

To what degree can a time and topic-based user-profile be used to predict future user searches?

User Modelling

slide-19
SLIDE 19

What is User Modeling?

  • Building a user profile based on query topics
  • A representation of user interests
  • Can be used to personalize Web search results

Research Question:

  • To what degree can a time and topic-based user-profile be used to predict

future user searches? Novelty:

  • Associating query times to the topics that represent them.
  • Investigating 24 hours, 1 week and fortnight encodings.

User Modelling - Overview

slide-20
SLIDE 20

Latent Semantic Indexing (LSI) [6]

  • Assumes one topic per query.

Probabilistic Latent Semantic Indexing (PLSI) [7]

  • Limited to the number of topics detected during training.
  • I.e. cannot set the number of topics

Latent Dirichlet Allocation (LDA) [8]

  • Queries related to multiple topics.
  • Not limited to a set number of topics.
  • I.e. can set the number of topics

User Modelling - Topic Modelling Approaches

slide-21
SLIDE 21

Preprocess

User Modelling - Methodology

Building a User Profile:

23:00 - “java” 23:02 - “coffee houses near me” 08:00 - “java” 08:01 - “java programming guides” ... 23:00 - “java” 23:02 - “coffee house” 08:00 - “java” 08:01 - “java program guide” ...

  • 1. Query Log:
  • 2. Pre-Processed Query Log:

“java” “coffee house” “program” ...

  • 3. Topics:

Topic Modelling 0.2 - “java” 0.1 - “coffee house” 0.6 - “program” ...

  • 4. User Profile:

Associate Time

slide-22
SLIDE 22

User Modelling - Evaluation

Step 1 - Prepare AOL query log:

1 - 23:02 - “coffee houses” 2 - 08:01 - “java programming” ...

  • 1. AOL Query Log:

1 - 23:02 - “coffee houses” ...

  • 2. Training Set (80%):

2 - 08:01 - “java programming” ...

  • 3. Testing Set (20%):

Split 1 - “Java”, “Programming” ...

  • 4. Testing Set Topics:

Topic Modeling

slide-23
SLIDE 23

User Modelling - Evaluation

24 Hour Profile 1 Week Profile

23:02 - “coffee houses near me” ...

  • 2. Training Set (80%):

2 Week Profile

Build Profiles

Step 2 - Different Profile Build Profiles:

slide-24
SLIDE 24

User Modelling - Evaluation

Should be looking for Java 24 Hour Profile 1 Week Profile Should be looking for Cats Test Profiles 1 - “Java”, “Programming” ...

  • 4. Testing Set Topics:

* Sliding Window Approach Step 3- Check prediction ability:

slide-25
SLIDE 25

User Modelling - Evaluation

AOL query logs are Controversial:

  • Poor anonymisation
  • Now redacted by AOL
  • Terms of Use: for non-commercial research use only

AOl Query Log Snippet:

slide-26
SLIDE 26

A means for improving search result relevance

Re-ranking Algorithm

slide-27
SLIDE 27

Re-ranking Algorithm - Related Works

  • Teevan et al. [1] suggested an issue in providing more personalised

search results to users

○ User unlikely to specify their intentions ■ Though different users have different intentions

  • Solution - use implicit data about the user to improve results

○ Re-rank returned results for a query based on this implicit data to improve relevance

  • Efficient client-side computation is able to provide improvements in

search rankings

○ scalable

slide-28
SLIDE 28

Re-ranking Algorithm - Related Works

  • Mandl [4] observed that while different personalisation algorithms can

improve results, no such algorithm takes the diverging interests of users at different times into account

  • Why re-ranking?

○ Search engines already provide results that satisfy a wide range of interests ■ Re-ranking to make these results more individualised ○ Mandl[4] showed that re-ranking is more effective than query modification ■ Adapts to users interests

slide-29
SLIDE 29

Re-ranking Algorithm - Aim

Can a Web-search ranking algorithm that personalizes results on the basis of time-sensitivity return results that are more relevant to a user than an algorithm that does not?

  • Currently, no Web-search personalisation methods factor in time as

implicit information to determine rankings Goal: determine if re-ordering search results factoring in time can improve the relevance of results

  • Produce a solution to re-rank search results based on a user’s habits over

time

slide-30
SLIDE 30

Re-ranking Algorithm - Methodology

1. Create 10-12 “Dummy” profiles of ideal users

○ Create independance from other stages of the project ○ Use user profiles generated in step 2 as input ○ Dummy profiles as temporary stand-ins

2. Retrieve top ~20 results for a query

○ Using JSoup to extract HTML data

slide-31
SLIDE 31

Re-ranking Algorithm - Methodology

3. Analyse snippets of results for topics - Dictionary analysis

slide-32
SLIDE 32

Re-ranking Algorithm - Methodology

4. Search user history for overlap with topics 5. Re-rank list of returned documents based on each document and the user profile

  • With respect to topic and time
slide-33
SLIDE 33

Re-ranking Algorithm - Evaluation

Offline User Evaluation

  • Users make judgements from a predefined list of ranked documents

○ TimeRank vs. Google search results prior to re-ranking

  • Multi-level document scores [10]
  • Binary ranking score

○ relevant/not relevant

slide-34
SLIDE 34

Re-ranking Algorithm - Evaluation

  • Relevance judgements of the ranking algorithms will be scored using the

NDCG metric (Normalised Discounted Cumulative Gain)

○ Measure usefulness of document based on its position in a list ○ Defines information gain based on the relevance score assigned to a document[10] ■ multi-level

  • MAP score (Mean Average Precision)

○ Considers order of documents

slide-35
SLIDE 35

Project Planning

slide-36
SLIDE 36

Key Success Factors

  • Rankings should be stable

○ Same results should be produced for same query and user profile

  • Algorithm should not re-rank results if there are no topic preferences

stored in the user profile

slide-37
SLIDE 37

Project Timeline

slide-38
SLIDE 38

Primary Risks

No. Risk Description Mitigation Strategy Probability / Severity 1 Should a team member fail to deliver a usable piece, this will impact the rest of the project as the 3 sections are not easily

  • separable. Can impact the work of others in

the group. Make use of temporary data in the sections that rely on one another. For example, the ranking algorithm will make use of “dummy” user profiles in the absence of real ones. Low / High - Will prevent the delivery of the final working plugin. 2 Falling behind on schedule. Follow the Gantt Chart deadlines, have regular meetings. Plan for 1st and 2nd drafts for deliverables. Medium / High - Will add delays to the project's progress. 3 Progress of the project is halted due to lack

  • f insight/knowledge in fully understanding

the project. Hold regular meetings with the supervisor and co-supervisor. Medium / Low - Will add a minor delay to projects progress.

slide-39
SLIDE 39

Anticipated Outcomes

Software 1. Google chrome plugin

○ Gather user search queries

2. Utility Program to build a user profile

○ Based on a collection of queries

3. Utility program that re-ranks search results

○ Based on a user profile

slide-40
SLIDE 40

Anticipated Outcomes

Answers to Research Questions

  • To what degree can a time and topic-based user-profile be used to predict

future user searches?

  • Can a Web-search ranking algorithm that personalizes results on the basis
  • f time-sensitivity return results that are more relevant to a user than an

algorithm that does not?

slide-41
SLIDE 41

Conclusion - Final Result

User Objective at 11PM: Coffee User Objective at 8AM: Programming

slide-42
SLIDE 42

Questions?

slide-43
SLIDE 43

References - 1/4

[1] Teevan, Jaime, Dumais, Susan T., and Horvitz, Eric. Personalizing search via automated analysis of interests and activities. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval ( 2005), ACM, 449-456. [2] Lee, Hyun Chul and Borodin, Allan. Cluster based personalized Search. In International Workshop on Algorithms and Models for the Web-Graph ( 2009), Springer, 167-183. [3] Liu, Fang, Yu, Clement, and Meng, Weiyi. Personalized web search by mapping user queries to categories. In Proceedings of the eleventh international conference on Information and knowledge management ( 2002), ACM, 558-565.

slide-44
SLIDE 44

References - 2/4

[4] Mandl, Thomas. Artificial Intelligence for Information Retrieval. IGI Global, 2009. [5] Bennett, P. N., White, R. W., Chu, W., Dumais, S. T., Bailey, P., Borisyuk, F. and Cui, X. Modeling the impact of short-and long-term behavior on search

  • personalization. ACM, (2012), 185-194.

[6] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R. Indexing by latent semantic analysis. Journal of the American society for information science, 41, 6 (1990), 391. [7] Hofmann, T. Probabilistic latent semantic indexing. ACM, (1999), 50-57.

slide-45
SLIDE 45

References - 3/4

[8] Wei, X. and Croft, W. B. LDA-based document models for ad-hoc

  • retrieval. ACM, (2006), 178-185.

[9] Vu, T., Willis, A., Kruschwitz, U. and Song, D. Personalised query suggestion for intranet search with temporal user profiling. arXiv preprint arXiv:1701.02050, ( 2017). [10] Kekäläinen, J. (2005). Binary and graded relevance in IR evaluations—Comparison of the effects on ranking of IR systems. Information Processing & Management, 41(5), 1019-1033. http://dx.doi.org/10.1016/j.ipm.2005.01.004

slide-46
SLIDE 46

References - 4/4

[11] Personalized Search for everyone. (2017). Official Google Blog. Retrieved 12 June 2017, from https://googleblog.blogspot.co.za/2009/12/personalized-search-for-everyone. html [12] Sheng, H., Goker, A. S. and He, D. Web user search pattern analysis for modelling query topic changes. Lecture Notes in Computer Science, 2109 (2001).