Time- -dependent Similarity Measure dependent Similarity Measure - - PowerPoint PPT Presentation

time dependent similarity measure dependent similarity
SMART_READER_LITE
LIVE PREVIEW

Time- -dependent Similarity Measure dependent Similarity Measure - - PowerPoint PPT Presentation

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity Measure of Queries Using Historical Click- - of Queries Using Historical Click- of Queries Using Historical Click through Data through Data


slide-1
SLIDE 1

Time-dependent Similarity Measure

  • f Queries Using Historical Click-

through Data Time Time-

  • dependent Similarity Measure

dependent Similarity Measure

  • f Queries Using Historical Click
  • f Queries Using Historical Click-
  • through Data

through Data

Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan Liu

* This work was done when Zhao and Hoi were interns at Microsoft Research Asia

slide-2
SLIDE 2

Outline Outline

Background Observations and Motivation Our approach Empirical study Future work

slide-3
SLIDE 3

Background Background

  • A dilemma for Web search engines
  • Very short queries ~2.5
  • Inconsistency of term usages
  • The Web is not well-organized
  • Users express queries with their own vocabulary
slide-4
SLIDE 4

Background (cont’d) Background (cont’d)

  • Solution: query expansion
  • Document term based expansion (KDD00, SIGIR05)
  • a query can be expanded with top keywords in the top-

k relevant documents

  • Query term based expansion (WWW02, CIKM04)
  • a query can be expanded with similar queries (queries

are similar if they lead to similar pages, pages are similar if they are visited by issuing similar queries)

  • Click-though data were used for query

expansion in many previous work.

slide-5
SLIDE 5

Background (cont’d) Background (cont’d)

  • Click-through data
  • Log data about the interactions between users and

Web search engines

  • Typical Click-through data representation
slide-6
SLIDE 6

(month)

Observation 1 Observation 1

  • Accuracy of query similarity

Calculated only from the click- through data in that time interval. Calculated from all the click- through data before that time point

slide-7
SLIDE 7

Observation 2 Observation 2

  • Event driven and dynamic character of query

similarity

the keyword “firework” and related pages are becoming more popular one week before the event and reach the peak on July 4th “firework + injuries" and “firework + picture“ have a little delay in terms of the number of times being issued and visited. “firework + injuries" and “firework + picture“ have a little delay in terms of the number of times being issued and visited. “firework + market" and “firework + show" become popular and reach their peaks a few days before July 4th

slide-8
SLIDE 8

Motivations Motivations

  • Exploit the click-through data for semantic

similarity of queries by incorporating temporal information

  • To combine explicit content similarity and

implicit semantic similarity

slide-9
SLIDE 9

Our Approach Our Approach

slide-10
SLIDE 10

Time-Dependent Concepts Time-Dependent Concepts

  • Calendar schema and pattern
  • Example
  • Calendar schema <day, month, year>
  • Calendar pattern <15, *,*>
  • <15, 1, 2002> is contained in the pattern <15, *,*>
slide-11
SLIDE 11

Time-Dependent Concepts Time-Dependent Concepts

  • Click-Through Subgroup
  • Example
  • Based on the schema <day, week>, and the pattern <1,*>,

<2,*>,…,<7,*>, we can partition the data into 7 groups, which correspond to Sun, Mon, Tue, …, Sat.

slide-12
SLIDE 12

Similarity Measure Similarity Measure

  • For efficiency and simplicity, we measure the query similarity in

a certain time slot only based on the click-through data.

  • Vector representation of queries with respect to clicked

documents.

  • wi is defined by Page Frequency (PF) and Inverted Query

Frequency (IQF)

slide-13
SLIDE 13

Similarity Measure Similarity Measure

  • Query similarity measures
  • Cosine function
  • Marginalized kernel
  • By introducing query clusters, one can model the query

similarity in a more semantic way.

slide-14
SLIDE 14

Time-Dependent Similarity Measure Time-Dependent Similarity Measure

slide-15
SLIDE 15

Empirical Evaluation Empirical Evaluation

  • Dataset
  • Click-through log of a commercial search engine:
  • June 16, 2005 to July 17,2005
  • Total size of 22GB
  • Only queries from US
  • Calendar schema and pattern
  • <hour, day, month>, <1, *, *>, <2, *, *>, …
  • Divide the data into 24 subgroups
  • Average subgroup size: 59,400,000 query-page pairs
slide-16
SLIDE 16

Empirical Examples Empirical Examples

  • Kids+toy, map+route
slide-17
SLIDE 17

Empirical Examples Empirical Examples

  • weather + forecast, fox + news
slide-18
SLIDE 18

Quality Evaluation Quality Evaluation

  • Experimental Settings
  • Partition 32-day dataset into two parts
  • First part for model construction
  • Second part for model evaluation
  • Accuracy is defined as the percentage of difference between the

actual similarity and the model-based prediction

  • 1000 representative query pairs, similarity larger than 0.3 using

the entire dataset

  • Half of them are top queries of the month
  • Half are selected manually related to real world events such as

“hurricane”.

slide-19
SLIDE 19

Experimental Results Experimental Results

slide-20
SLIDE 20

Experimental Results Experimental Results

“” For example, when the distance is 1

and the training data size is 10, we summarize all the accuracy values that use the I to 10+i days as training and use the 10+1+i as testing.

slide-21
SLIDE 21

Experimental Results Experimental Results

slide-22
SLIDE 22

Conclusion Conclusion

Presented a preliminary study of the dynamic nature of query similarity using click-through data Observed and verified that query similarity are dynamic and event driven with real data Proposed an time-dependent model For our future work, we will investigate an adaptive way to determine the most suitable time granularity for two given queries.

slide-23
SLIDE 23

Thanks! Thanks!

tyliu@microsoft.com http://research.microsoft.com/users/tyliu