Modeling User Behavior and Interactions M d li U B h i d I t ti - - PowerPoint PPT Presentation

modeling user behavior and interactions m d li u b h i d
SMART_READER_LITE
LIVE PREVIEW

Modeling User Behavior and Interactions M d li U B h i d I t ti - - PowerPoint PPT Presentation

Modeling User Behavior and Interactions M d li U B h i d I t ti Lecture 4: Search Personalization Eugene Agichtein Emory University Lecture 4 Outline 1. Approaches to Search Personalization 2 Dimensions of Personalization 2. Dimensions


slide-1
SLIDE 1

M d li U B h i d I t ti Modeling User Behavior and Interactions

Lecture 4: Search Personalization

Eugene Agichtein Emory University

slide-2
SLIDE 2

Lecture 4 Outline

  • 1. Approaches to Search Personalization

2 Dimensions of Personalization

  • 2. Dimensions of Personalization
  • 1. Which queries to personalize?

2 What input to use for personalization?

  • 2. What input to use for personalization?
  • 3. Granularity: personalization vs. groupization

4 C G i l h i

  • 4. Context: Geograpical, search session

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 2

slide-3
SLIDE 3

Approaches to Personalization pp

  • 1. Pitkow et al., 2002
  • 2. Qiu et al., 2006
  • 3. Jeh et al., 2003
  • 4. Teevan et al., 2005

5

  • 5. Das et al., 2007

1 2 4 3

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 3 Figure adapted from: Personalized search on the world wide web, by Micarelli, A. and Gasparetti, F. and Sciarrone, F. and Gauch, S., LNCS 2007

slide-4
SLIDE 4

When to Personalize

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 4 Figure adapted from: Personalized search on the world wide web, by Micarelli, A. and Gasparetti, F. and Sciarrone, F. and Gauch, S., LNCS 2007

slide-5
SLIDE 5

Example: Outride p

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 5

From Pitkow et al., 2002

slide-6
SLIDE 6

Outride (Results) ( )

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 6

From Pitkow et al., 2002

slide-7
SLIDE 7

Input to Personalization p

  • Behavior (clicks): Qiu and Cho, 2006

– Use clicks to tune a personalized (topic sensitive) PageRank model for each user – Use personalized PageRank to re-rank web search results

  • Profile (user model): SeeSaw (Teevan et al., 2005)

( ) ( , )

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 7

slide-8
SLIDE 8

PageRank Computation g p

I: Set of Incoming links O: Set of Outgoing links O: Set of Outgoing links c: Dampening factor (~0.15) or “teleportation probability” E: Some probability vector over the Webpages p y p g

⋅ ⋅

PR(q) PR(p) = (1-c) +c E(p)

p q q

q I(p)

PR(p) (1 c) +c E(p) O(q)

p q E vector can be: E vector can be:

  • Uniformly distributed probabilities over all Web Page (democratic)
  • Biased distributed probabilities to a number of important pages
  • Top-levels of Web Servers

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 8

Top levels of Web Servers

  • Hub/ Authority pages
  • Used for Customization (Personalization)
slide-9
SLIDE 9

Topic-Sensitive PageRank

  • Uninfluenced PageRank

“Page is important if many

  • Influenced PageRank

“Page is important if many important g p f y important pages point to it” g p f y p pages point to it, and btw, the following are by definition important pages.”

Main Idea Assign multiple a-priori “importance” estimates to pages with t t t f t i respect to a set of topics One PageRank score per basis topic

  • Query specific rank score (+)

Q y p ( )

  • Make use of context (+)
  • Inexpensive at runtime (+)

9

slide-10
SLIDE 10

PageRank vs Topic-Sensitive PageRank

Query Processor query Input: Web graph G

PageRank

Web graph P R k() y Query-time page → rank Web graph G Output: Rank vector r : (page → page PageRank() Offline r : (page → page importance) context query

Topic-Sensitive PageRank

Web graph Query Processor (Page, topic) k context Input: Web W, Basis topics [c1, ... ,c16] e.g. 16 categories (first level

  • f ODP)

TSPageRank() Query-time → ranktopic Classifier Output: List of rank vectors [r1, ... ,r16]

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 10

Offline

Yahoo!

  • r ODP

rj : page → page importance in topic cj

slide-11
SLIDE 11

Input to Personalization p

  • Behavior (clicks): Qiu and Cho, 2006

– Use clicks to tune a personalized (topic sensitive) PageRank model for each user

Map clicked results to ODP

– Use personalized PageRank to re-rank web search l results

  • Profile (user model): SeeSaw (Teevan et al., 2005)

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 11

slide-12
SLIDE 12

PS Search Engine (Profile-based)

[Teevan et al 2005] [Teevan et al., 2005] bellevue User profile:

Content, interaction history Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-13
SLIDE 13

Result Re-Ranking

  • Ensures privacy
  • Good evaluation framework
  • Can look at rich user profile

Can look at rich user profile

  • Look at light weight user models

Collected on server side – Collected on server side – Sent as query expansion

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-14
SLIDE 14

BM25 with Relevance Feedback BM25 with Relevance Feedback

N

Score = Σ tfi * wi

ni r

i

R

(ri+0.5)(N-ni-R+ri+0.5) (ni-ri+0.5)(R-ri+0.5) wi = log

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-15
SLIDE 15

User Model as Relevance Feedback

N

Score = Σ tfi * wi

R r N’ = N+R ni ri ni’ = ni+ri

(ri+0.5)(N’-ni’-R+ri+0.5) (ni’- ri+0.5)(R-ri+0.5) wi = log

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-16
SLIDE 16

User Model as Relevance Feedback

World Focused Matching

N

World

World Focused Matching Score = Σ tfi * wi

R r

User Web related to query

ni r

i

User related to query

R N ri ni

to query

Query Focused Matching

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-17
SLIDE 17

User Representation p

  • Stuff I’ve Seen (SIS) index

– MSR research project [Dumais, et al.] – Index of everything a user’s seen

  • Recently indexed documents
  • Web documents in SIS index

Web documents in SIS index

  • Query history
  • None

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-18
SLIDE 18

World Representation p

  • Document Representation

– Full text – Title and snippet

  • Corpus Representation

– Web Web – Result set – title and snippet – Result set – full text Result set full text

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-19
SLIDE 19

Parameters

  • Matching

Query focused

  • User representation

World focused All SIS Recent SIS Web SIS

User representation W ld t ti

Web SIS Query history None

  • World representation

Full text Title and snippet

  • Query expansion

Web Result set – full text Result set – title and snippet

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

Result set title and snippet

slide-20
SLIDE 20

Results: Seesaw Improves Retrieval p

0.6 0.5 0.6

No user model

0.3 0.4

DCG

Random Relevance

0 1 0.2

D

Relevance Feedback Seesaw

0.1 None Rand RF SS Web Combo

Seesaw

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

None Rand RF SS Web Combo

slide-21
SLIDE 21

Results: Feature Contribution

0.6 0.5 0.6 0.3 0.4

DCG

0 1 0.2

D

0.1 None Rand RF SS Web Combo

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

None Rand RF SS Web Combo

slide-22
SLIDE 22

Summary

Rich user model important for search Rich user model important for search personalization Seesaw improves text based retrieval

1

Seesaw improves text based retrieval Need other features t i W b

0.6 0.8

to improve Web Lots of room

future

0.2 0.4

for improvement

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

None SS Web Group ?

slide-23
SLIDE 23

Evaluating Personalized Search

  • Explicit judgments (offline and in situ)

Evaluate components before system – Evaluate components before system – NOTE: What’s relevant for you

  • Deploy system

Deploy system

– Verbatim feedback, Questionnaires, etc. – Measure behavioral interactions (e.g., click, reformulation, abandonment etc ) abandonment, etc.) – Click biases –order, presentation, etc. – Interleaving for unbiased clicks g

  • Link implicit and explicit (Curious Browser plugin)
  • Beyond a single query -> sessions and beyond

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 23

slide-24
SLIDE 24

User Control in Personalization (RF) ( )

J-S Ahn P Brusilovsky D He and S Y Syn Open user profiles for

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 24

J S. Ahn, P. Brusilovsky, D. He, and S.Y. Syn. Open user profiles for adaptive news systems: Help or harm? WWW 2007

slide-25
SLIDE 25

Study: Comparing Personalization Strategies

[ D t l 2007]

  • 10,000 users, 56,000 queries, and 94,000 clicks over

[ Dou et al., 2007]

12 days.

  • Used the first 11 days' worth of data to form user

profiles and clicks.

  • Simulated the application of five different

personalization algorithms on the remaining 4,600 queries from the last day of the log.

  • Retrieved top 50 results for each query from the

comparison search engine and assumed that clicking

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

a link indicated a relevance judgment for the query

25

slide-26
SLIDE 26

Results: Which Strategy is Most Effective?

[ D t l 2007]

  • Compared two click-based (behavior)

[ Dou et al., 2007]

personalization strategies to three profile-based strategies

  • Click-based strategies appear more effective

than profile-based (but carefully combining p y g historical profile data helps slightly)

  • Search context crucial

Search context crucial

  • Personalization effectiveness varies by query

E l t d i ï li k d l

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

  • Evaluated using naïve click models

26

slide-27
SLIDE 27

Lecture 4 Outline

Approaches to Search Personalization 1 Dimensions of Personalization

  • 1. Dimensions of Personalization
  • What input to use for personalization?

Which queries to personalize? Which queries to personalize?

  • 1. Granularity: personalization vs. groupization

2 Context: Geograpical

  • 2. Context: Geograpical

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 27

slide-28
SLIDE 28

Understanding Query Ambiguity Understanding Query Ambiguity

SIGIR 2008

Jaime Teevan, Susan Dumais, Dan Liebling Microsoft Research

slide-29
SLIDE 29

“grand copthorne waterfront” g p

slide-30
SLIDE 30

“singapore” g p

slide-31
SLIDE 31

How Do the Two Queries Differ? Q

  • grand copthorne waterfront v. singapore
  • Knowing query ambiguity allow us to:

– Personalize or diversify when appropriate y pp p – Suggest more specific queries – Help people understand diverse result sets Help people understand diverse result sets

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-32
SLIDE 32

Understanding Ambiguity g g y

  • Look at measures of query ambiguity

– Explicit – Implicit

  • Explore challenges with the measures

– Do implicit predict explicit? Do implicit predict explicit? – Other factors that impact observed variation?

  • Build a model to predict ambiguity
  • Build a model to predict ambiguity

– Using just the query string, or also the result set U i hi t t

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

– Using query history, or not

slide-33
SLIDE 33

Which Queries to Personalize?

[Teevan et al 2008]

  • Personalization benefits ambiguous queries

I li bili (Fl i ’ k )

[Teevan et al., 2008]

  • Inter-rater reliability (Fleiss’ kappa)

– Observed agreement (Pa) exceeds expected (Pe) (P P ) / (1 P ) – κ = (Pa-Pe) / (1-Pe)

  • Relevance entropy

– Variability in probability result is relevant (Pr) – S = -Σ Pr log Pr

l f l

  • Potential for personalization

– Ideal group ranking differs from ideal personal

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

– P4P = 1 - nDCGgroup

33 Teevan, J, S. T. Dumais, and D. J. Liebling. To personalize or not to personalize: modeling queries with variation in user intent., SIGIR 2008

slide-34
SLIDE 34

Predicting Ambiguity

[Teevan et al 2008] History [Teevan et al., 2008] No Yes y Query length Contains URL Reformulation probability # of times query issued ation Query Contains advanced operator Time of day issued Number of results (df) Number of query suggests # of users who issued query

  • Avg. time of day issued
  • Avg. number of results

Avg number of query suggests Informa Number of query suggests

  • Avg. number of query suggests

ults Query clarity ODP category entropy Number of ODP categories Result entropy

  • Avg. click position
  • Avg. seconds to click

Resu g Portion of non-HTML results Portion of results from .com/.edu Number of distinct domains g

  • Avg. clicks per user

Click entropy Potential for personalization

S Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia Teevan, J, S. T. Dumais, and D. J. Liebling. To personalize or not to personalize: modeling queries with variation in user intent., SIGIR 2008

slide-35
SLIDE 35

Collecting Implicit Relevance Data g p

  • Variation in clicks

– Proxy (click = relevant, not clicked = irrelevant) – Other implicit measures possible – Disadvantage: Can mean lots of things, biased – Advantage: Real tasks, real situations, lots of data g

  • 44k unique queries issued by 1.5M users

– Minimum 10 users/query Minimum 10 users/query

  • 2.5 million result sets “evaluated”

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-36
SLIDE 36

How Good are Implicit Measures? p

  • Explicit data is expensive

1

Explicit data is expensive

  • Implicit good substitute?
  • Compared queries with

0 9 uity

Compared queries with

– Explicit judgments and – Implicit judgments

0.9 licit Ambig

p j g

  • Significantly correlated:

– Correlation coefficient =

0.8 Impl

0.77 (p<.01)

0.7 0.7 0.8 0.9 1 Explicit Ambiguity Explicit Ambiguity

slide-37
SLIDE 37

Which Has Lower Click Entropy? py

  • www usajobs gov v federal government jobs

www.usajobs.gov v. federal government jobs

  • find phone number v. msn live search

i l i l

R l

  • singapore pools v. singaporepools.com

Click entropy = 1 5 Click entropy = 2 0 Results change 1.5 2.0 Result entropy = 5.7 Result entropy = 10.7

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-38
SLIDE 38

Challenges with Using Click Data g g

  • Results change at different rates
  • Result quality varies
  • Task affects the number of clicks

Task affects the number of clicks W d ’t k li k d t f i

  • We don’t know click data for unseen queries

Can we predict query ambiguity?

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-39
SLIDE 39

Result Summary

[Teevan et al 2008]

y

History No Yes

  • All features = good prediction
  • 81% accuracy (↑ 220%)

[Teevan et al., 2008]

Information Query sults

  • 81% accuracy (↑ 220%)
  • Just query features promising

40% (↑ 57%)

Res

  • 40% accuracy (↑ 57%)
  • No boost adding results or history

URL Very Low Ads High Low

Yes No 3+ =1

Ads Length Low Medium

No <3 1 2+

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia Teevan, J, S. T. Dumais, and D. J. Liebling. To personalize or not to personalize: modeling queries with variation in user intent., SIGIR 2008

slide-40
SLIDE 40

Lecture 4 Outline

Approaches to Search Personalization 1 Dimensions of Personalization

  • 1. Dimensions of Personalization
  • What input to use for personalization?

Which queries to personalize? Which queries to personalize? Granularity: personalization vs. groupization 1 Context: Geograpical search session

  • 1. Context: Geograpical, search session

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 40

slide-41
SLIDE 41

Connection: Collaborative Filtering and R d S t Recommender Systems

–Identify related groups

  • Browsed pages [Almeida & Almeida 2004;

Sugiyama et al. 2005] g y

  • Queries [Freyne & Smyth 2006; Lee 2005]
  • Location [Mei & Church 2008] company
  • Location [Mei & Church 2008], company

[Smyth 2007], etc. U d t t fill i i i l d t –Use group data to fill in missing personal data

  • Typically data based on user behavior

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-42
SLIDE 42

Discovering and Using Groups to Improve Personalized Search

Jaime Teevan, Merrie Morris, Steve Bush Microsoft Research WSDM 2009

slide-43
SLIDE 43

[ Slides from Teevan et al., WSDM 2009 ] Diego Velasquez, Las Lanzas

slide-44
SLIDE 44

People Express Things Differently

[ Slides from Teevan et al., WSDM 2009 ]

p p g y

  • Differences can be a challenge for Web search

– Picture of a man handing over a key. – Oil painting of the surrender of Breda.

  • Personalization

– Closes the gap using more about the person Closes the gap using more about the person

  • Groupization

Closes the gap using more about the group – Closes the gap using more about the group

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-45
SLIDE 45

How to Take Advantage of Groups?

[ Slides from Teevan et al., WSDM 2009 ]

g p

  • Who do we share

Who do we share interests with?

  • Do we talk about things

similarly?

  • What algorithms should

we use?

slide-46
SLIDE 46

Approach

[ Slides from Teevan et al., WSDM 2009 ]

  • Who do we share interests with?
  • Who do we share interests with?
  • Who do we share interests with?
  • Who do we share interests with?

pp

Who do we share interests with?

– Similarity in query selection Similarity in what is considered relevant

Who do we share interests with?

– Similarity in query selection Similarity in what is considered relevant

Who do we share interests with?

– Similarity in query selection Similarity in what is considered relevant

Who do we share interests with?

– Similarity in query selection Similarity in what is considered relevant – Similarity in what is considered relevant

  • Do we talk about things similarly?

– Similarity in what is considered relevant

  • Do we talk about things similarly?

– Similarity in what is considered relevant

  • Do we talk about things similarly?

– Similarity in what is considered relevant

  • Do we talk about things similarly?

– Similarity in user profile

  • What algorithms should we use?

– Similarity in user profile

  • What algorithms should we use?

– Similarity in user profile

  • What algorithms should we use?

– Similarity in user profile

  • What algorithms should we use?

– Groupize results using groups of user profiles – Evaluate using groups’ relevance judgments – Groupize results using groups of user profiles – Evaluate using groups’ relevance judgments – Groupize results using groups of user profiles – Evaluate using groups’ relevance judgments – Groupize results using groups of user profiles – Evaluate using groups’ relevance judgments

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-47
SLIDE 47

Interested in Many Group Types

[ Slides from Teevan et al., WSDM 2009 ]

y p yp

  • Group longevity

Group longevity

– Task-based – Trait-based

Explicit

Task Age Gende

  • Group identification

– Explicit

entificatio

E

Gende r Job team Job role Location Interest

– Implicit

Ide n

Implicit

group Relevance judgments Query selection Desktop content

Task-based Trait- based

Longevity selection

slide-48
SLIDE 48

Queries Studied

[ Slides from Teevan et al., WSDM 2009 ]

Q

Trait-based dataset Task-based dataset Trait based dataset

  • Challenge

– Overlapping queries

Task based dataset

  • Common task

– Telecommuting v. office Overlapping queries – Natural motivation

  • Queries picked from 12

Telecommuting v. office

pros and cons of working in an office social comparison

p

– Work

c# delegates, live meeting

social comparison telecommuting versus office telecommuting

– Interests

bread recipes, toilet train dog working at home cost benefit dog

slide-49
SLIDE 49

Data Collected

[ Slides from Teevan et al., WSDM 2009 ]

  • Queries evaluated
  • Explicit relevance judgments

– 20 - 40 results – Personal relevance

  • Highly relevant

g y

  • Relevant
  • Not relevant
  • User profile: Desktop index

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-50
SLIDE 50

Answering the Questions

[ Slides from Teevan et al., WSDM 2009 ]

g Q

  • Who do we share

Who do we share interests with?

  • Do we talk about things

similarly?

  • What algorithms should

we use?

slide-51
SLIDE 51

Who do we share interests with?

[ Slides from Teevan et al., WSDM 2009 ]

  • Variation in query selection

– Work groups selected similar work queries – Social groups selected similar social queries

  • Variation in relevance judgments

– Judgments varied greatly (κ=0.08) Judgments varied greatly (κ 0.08) – Task-based groups most similar – Similar for one query ≠ similar for another Similar for one query ≠ similar for another

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-52
SLIDE 52

Do we talk about things similarly?

[ Slides from Teevan et al., WSDM 2009 ]

  • Group profile similarity

g y

– Members more similar to each other than others – Most similar for aspects related to the group

In task group Not in group Difference 0.42 0.31 34% In task group Not in group Difference All queries 0.42 0.31 34% Group queries 0 77 0 35 120%

  • Clustering profiles recreates groups

d l d l

Group queries 0.77 0.35 120%

  • Index similarity ≠ judgment similarity

– Correlation coefficient of 0.09

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-53
SLIDE 53

What algorithms should we use?

[ Slides from Teevan et al., WSDM 2009 ]

g

  • Calculate personalized score for each member

– Content: User profile as relevance feedback

(ri+0.5)(N-ni-R+ri+0.5) tf

Σ

– Behavior: Previously visited URLs and domains

(ni-ri+0.5)(R-ri+0.5) tfi log

Σ

terms i

y

  • Sum personalized scores across group
  • Produces same ranking for all members
  • Produces same ranking for all members

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

slide-54
SLIDE 54

Performance: Task-Based Groups

[ Slides from Teevan et al., WSDM 2009 ]

p

  • Personalization

0.8

Personalization improves on Web

  • Groupization gains +5%

0.6 0.7

p g

0.4 0.5 0.2 0.3 0.1 Web Groupization Web Personalized Groupized Web Groupization

slide-55
SLIDE 55

Performance: Task-Based Groups

[ Slides from Teevan et al., WSDM 2009 ]

p

  • Personalization

0.8

Personalization improves on Web

  • Groupization gains +5%

0.6 0.7

p g

  • Split by query type

– On-task v. off-task

0.4 0.5 es es

– Groupization the same as personalization for ff t k i

0.2 0.3 task querie task querie

  • ff-task queries

– 11% improvement for

  • n-task queries

0.1 Web Groupization Off-t On-t Web Personalized Groupized

q

Web Groupization

slide-56
SLIDE 56

Performance: Trait-Based Groups

[ Slides from Teevan et al., WSDM 2009 ]

p

0.75

Interests Work

0.65 0.7 0.6 Normalized DCG 0.5 0.55

Groupization

0.45

Groupization Personalization

slide-57
SLIDE 57

Performance: Trait-Based Groups

[ Slides from Teevan et al., WSDM 2009 ]

p

0.75

Interests Work

0.65 0.7

Work queries

0.6 Normalized DCG

q

0.5 0.55

Groupization Interest queries

0.45

Groupization Personalization queries

slide-58
SLIDE 58

Performance: Trait-Based Groups

[ Slides from Teevan et al., WSDM 2009 ]

p

0.75

Interests Work

0.65 0.7

Work queries

0.6 Normalized DCG

q

0.5 0.55

Groupization Interest queries

0.45

Groupization Personalization queries

slide-59
SLIDE 59

Lecture 4 Outline

Approaches to Search Personalization 1 Dimensions of Personalization

  • 1. Dimensions of Personalization
  • What input to use for personalization?

Which queries to personalize? Which queries to personalize? Granularity: personalization vs. groupization Context: Geographical search session Context: Geographical, search session

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 59

slide-60
SLIDE 60

Local Search (Geographical Personalization) ( g p )

  • Location is context
  • Local search uses geographic information to

modify the ranking of search results modify the ranking of search results

– location derived from the query text location of the device where the query originated – location of the device where the query originated

  • e.g.,

– “underworld 3 cape cod” – “underworld 3” from mobile device in Hyannis

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 60

slide-61
SLIDE 61

Geography and Query Intent

[ B Y t d J ] 2008

“Pizza Amherst MA” Location 1: query location

[ Baeza-Yates and Jones] 2008

Pizza Amherst, MA query1 Distance 1: home–query intent Distance 2: Reformulation Reformulation distance “Pizza Northampton” query2 Location 2: Home address

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

IP address / profile zip Location 3: query location

slide-62
SLIDE 62

Topic-Distance Profiles

[ B Y t d J ] 2008

  • 20 bins

[ Baeza-Yates and Jones] 2008

– 0 distance – Equal fractions of the rest of the data

  • Does distribution into distance bins topics

vary by topic?

i h i l b Movie theater Distant places Near-by

movie theater maps restaurant

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 62

slide-63
SLIDE 63

Lecture 4 Outline

Approaches to Search Personalization 1 Dimensions of Personalization

  • 1. Dimensions of Personalization
  • What input to use for personalization?

Which queries to personalize? Which queries to personalize? Granularity: personalization vs. groupization Context: Geographical search session Context: Geographical, search session

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 63

slide-64
SLIDE 64

Key References and Further Reading y g

Marti Hearst, Search User Interfaces, 2009, Chapter 9: “Personalization in Search”, Cambridge University Press, http://searchuserinterfaces.com/ Pitkow, J., Schütze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., and Breuel, T. Personalized search. Communications of ACM, 2002 Teevan J Dumais S T and Horvitz E 2005 Personalizing search via Teevan, J., Dumais, S. T., and Horvitz, E. 2005. Personalizing search via automated analysis of interests and activities. , in Proc. of SIGIR 2005 Dou, Z., Song, R., and Wen, J. A large-scale evaluation and analysis of personalized search strategies, in Proc. of WWW 2007 p g , Das, A. S., Datar, M., Garg, A., and Rajaram, S. Google news personalization: scalable online collaborative filtering. In Proc. of WWW 2007 Qiu, F and J. Cho. Automatic Identification Of User Interest For Personalized Search., in Proc. of WWW 2006 Teevan, J, S. T. Dumais, and D. J. Liebling. To personalize or not to personalize: modeling queries with variation in user intent., in Proc. of SIGIR 2008 T J M i M d B h S Di i d U i G t I

Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia

Teevan, J, Morris M, and Bush, S. Discovering and Using Groups to Improve Personalized Search. WSDM 2009

64