Dyn ynam amic ic In Info form rmat atio ion n Env nvir - - PowerPoint PPT Presentation

dyn ynam amic ic in info form rmat atio ion n env nvir
SMART_READER_LITE
LIVE PREVIEW

Dyn ynam amic ic In Info form rmat atio ion n Env nvir - - PowerPoint PPT Presentation

Dyn ynam amic ic In Info form rmat atio ion n Env nvir ironm onmen ents ts Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais In collaboration with: Jaime Teevan, Eytan Adar, Jon Elsas, Dan Liebling, Richard


slide-1
SLIDE 1

UW, CSE 454, Dec 8 2009

Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais

Dyn ynam amic ic In Info form rmat atio ion n Env nvir ironm

  • nmen

ents ts

In collaboration with: Jaime Teevan, Eytan Adar, Jon Elsas, Dan Liebling, Richard Hughes

slide-2
SLIDE 2

UW, CSE 454, Dec 8 2009

Ou Outl tline ine

 Web search and context  Temporal dynamics of information

 Characterizing change

 Content changes over time  People re-visit and re-find  Relationships between content change and re-access

 Improving retrieval and understanding

 Building support for understanding change (e.g., DiffIE)  Leveraging dynamics for improved retrieval

slide-3
SLIDE 3

We Web b Sea earch rch at at 15 15

 Number of pages indexed

 7/94 Lycos – 54,000 pages  95 – 10^6 millions  97 – 10^7  98 – 10^8  01 – 10^9 billions  05 – 10^10 …

 Types of content

 Web pages, newsgroups  Images, videos, maps  News, blogs, spaces  Shopping, local, desktop  Books, papers, many formats  Health, finance, travel …

What’s available How it’s accessed

UW, CSE 454, Dec 8 2009

slide-4
SLIDE 4

 The search box  Spelling suggestions  Query suggestions  Auto complete  Inline answers  Richer snippets  But, we can do better

Sup upport port fo for r Sea earc rchers hers

UW, CSE 454, Dec 8 2009

… by understanding context

slide-5
SLIDE 5

UW, CSE 454, Dec 8 2009

Search Today

User Context Task/Use Context

Query Words

Ranked List

Search and Context

Document Context

Query Words

Ranked List

slide-6
SLIDE 6

UW, CSE 454, Dec 8 2009

Modeling Users

Short vs. long term Individual vs. group Implicit vs. explicit

Using User Models

Stuff I’ve Seen (re-finding) Personalized Search News Junkie (novelty) User Behavior in Ranking Domain Expertise at Web-scale

Evaluation

  • Many methods, scales
  • Individual components

and their combinations

Systems/Prototypes

  • New capabilities and experiences
  • Algorithms and prototypes
  • Deploy, evaluate and iterate

Redundancy Temporal Dynamics Inter-Relationships among Documents

Categorization and Metadata Reuters, spam, landmarks, web categories … Domain-specific features, time Interfaces and Interaction Stuff I’ve Seen, Phlat, Timelines, SWISH Tight coupling of browsing and search

slide-7
SLIDE 7

UW, CSE 454, Dec 8 2009

In Information formation Dy Dynamics namics

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Cont ntent ent Changes nges Today’s Browse and Search Experiences But, ignores … User er Vis isita itatio ion/ n/Re ReVisit Visitat ation ion

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

slide-8
SLIDE 8

UW, CSE 454, Dec 8 2009

Di Digi gital tal Dy Dyna nami mics cs Ea Easy to Cap

  • Captur

ure

 Easy to capture  Few tools support

dynamics

slide-9
SLIDE 9

Inf nformation

  • rmation Dy

Dynam namics ics

UW, CSE 454, Dec 8 2009

 Characterizing change

 Content changes over time  People re-visit and re-find  Relationships between content change and

re-access

 Improving retrieval and understanding

 Building support for understanding change

(e.g., DiffIE)

 Leveraging dynamics for improved retrieval

slide-10
SLIDE 10

UW, CSE 454, Dec 8 2009

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Cont ntent ent Changes nges User er Vis isita itatio ion/ n/Re ReVisit Visitat ation ion

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Ch Characterizi aracterizing ng Ch Chan ange ge

 Large- scale Web crawls, over time

 Revisited pages  55,000 pages crawled hourly for 18+ months  Unique users, visits/user, time between visits  Judged pages (Relevance to a query)  6 million pages crawled every two days for 6 months

slide-11
SLIDE 11

UW, CSE 454, Dec 8 2009

Meas easuring uring We Web b Pag age e Ch Chan ange ge

 Summary metrics

 Number of changes  Time between changes  Amount of change

 Change curves

 Fixed starting point  Measure similarity over

different time intervals

0.2 0.4 0.6 0.8 1 Dice Similarity Time e from starti rting ng point Knot point

slide-12
SLIDE 12

UW, CSE 454, Dec 8 2009

Measuring easuring Wi Within thin-Page Page Ch Chan ange ge

 DOM structure changes  Term use changes

 Divergence from norm

 cookbooks  salads  cheese  ingredient  bbq

 “Staying power” in page

Time

  • Sep. Oct. Nov. Dec.
slide-13
SLIDE 13

UW, CSE 454, Dec 8 2009

Ex Exam ampl ple Te Term Lon m Longe gevity ity Gr Grap aphs hs

slide-14
SLIDE 14

UW, CSE 454, Dec 8 2009

Re Revisitation visitation on

  • n th

the Web e Web

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Cont ntent ent Changes nges Users ers Revisit isit

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

What’s the last Web page you visited?

 Revisitation patterns

 Log analyses  Toolbar logs for revisitation  Query logs for re-finding  User survey to understand intent in revisitations

slide-15
SLIDE 15

UW, CSE 454, Dec 8 2009

Meas easuring uring Re Revisitation visitation

 Summary metrics

 Unique visitors  Visits/user  Time between visits

 Revisitation curves

 Histogram of revisit

intervals

 Normalized

0.2 0.4 0.6 0.8 1 Normalized Count Time Interval

slide-16
SLIDE 16

UW, CSE 454, Dec 8 2009

Fo Four ur Re Revisitation isitation Pat atterns terns

 Fast

 Hub-and-spoke  Navigation within site

 Hybrid

 High quality fast pages

 Medium

 Popular homepages  Mail and Web applications

 Slow

 Entry pages, bank pages  Accessed via search engine

slide-17
SLIDE 17

UW, CSE 454, Dec 8 2009

Repeat Click New Click Repeat Query 33% 29% 4% New Query 67% 10% 57% 39% 61%

Sea earch rch an and d Re Revis visitation itation

 Repeat query (33%)

 microsoft research

 Repeat click (39%)

 http://research.microsoft.com  Q: microsoft research, msr …

 Big opportunity (43%)

 24% “navigational revisits”

Repeat Query 33% New Query 67%

slide-18
SLIDE 18

UW, CSE 454, Dec 8 2009

Re Repeat t Cl Clicks ks for Re Repeat Queries

 Within session: Repeat query -> New click  Across sessions: Repeat query -> Repeat click

slide-19
SLIDE 19

UW, CSE 454, Dec 8 2009

Re Relat ations ionships hips Be Betwe ween en Re Revisitatio isitation an and Cha d Chang nge

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Cont ntent ent Changes nges

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Why did you revisit the last Web page you revisited?

slide-20
SLIDE 20

UW, CSE 454, Dec 8 2009

Pos

  • ssible

sible Re Relationships ationships

 Interested in change

 Monitor

 Effect change

 Transact

 Change unimportant

 Find new

 Change can interfere

 Re-find

slide-21
SLIDE 21

UW, CSE 454, Dec 8 2009

Un Understa derstanding nding the he Re Relationship ationship

 Compare summary metrics

 Revisits: Unique visitors, visits/user, interval  Change: Number, interval, Dice

2 visits/user 3 visits/user 4 visits/user 5 or 6 visits/user 7+ visits/user Number er of changes ges Time e between en changes ges Dice coefficie ficient nt 2 visits/user 172.91 91 133.26 26 0.82 3 visits/user 200.51 51 119.24 24 0.82 4 visits/user 234.32 32 109.59 59 0.81 5 or 6 visits/user 269.63 63 94.54 0.82 7+ visits/user 341.43 43 81.80 0.81

slide-22
SLIDE 22

UW, CSE 454, Dec 8 2009

0.2 0.4 0.6 0.8 1 1.2 NYT Woot Costco

Compa mparing ring Change nge and Revisi isit t Curves ves

 Three pages

 New York Times  Woot.com  Costco

 Similar change patterns  Different revisitation

 NYT: Fast  Woot: Medium  Costco: Slow

0.2 0.4 0.6 0.8 1 1.2 NYT Woot Costco 0.2 0.4 0.6 0.8 1 1.2 NYT Woot Costco 0.2 0.4 0.6 0.8 1 1.2 NYT Woot Costco

(news, forums) (retail)

Time

slide-23
SLIDE 23

UW, CSE 454, Dec 8 2009

Wi Within thin-Page Page Re Relationship ationship

 Page elements change at

different rates

 Pages revisited at

different rates

  • “Resonance”

can serve as a filter for interesting content

slide-24
SLIDE 24

UW, CSE 454, Dec 8 2009

slide-25
SLIDE 25

UW, CSE 454, Dec 8 2009

slide-26
SLIDE 26

UW, CSE 454, Dec 8 2009

slide-27
SLIDE 27

Dy Dynamics namics of

  • f In

Information

  • rmation

UW, CSE 454, Dec 8 2009

 Characterizing change

 Content changes over time  People re-visit and re-find  Relationships between content change and

re-access

 Improving retrieval and understanding

 Building support for understanding change

(e.g., DiffIE)

 Leveraging dynamics for improved retrieval

slide-28
SLIDE 28

UW, CSE 454, Dec 8 2009

Bu Building ding Su Supp ppor

  • rt for
  • r We

Web Dyn b Dynam amics cs

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Cont ntent ent Changes nges

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Diff IE Temporal IR

slide-29
SLIDE 29

UW, CSE 454, Dec 8 2009

DiffIE DiffIE

Ch Changes s to page sinc ince your las last visit isit Dif DiffI fIE toolba lbar

slide-30
SLIDE 30

UW, CSE 454, Dec 8 2009

In Interes teresting ting Fe Feat atures ures of

  • f Di

DiffIE ffIE

Always on In-situ New to you Non-intrusive

slide-31
SLIDE 31

UW, CSE 454, Dec 8 2009

Examples of DiffIE in Action

slide-32
SLIDE 32

UW, CSE 454, Dec 8 2009

Expected pected Ne New w Co Cont ntent ent

slide-33
SLIDE 33

UW, CSE 454, Dec 8 2009

Moni

  • nitor

tor

slide-34
SLIDE 34

UW, CSE 454, Dec 8 2009

Un Unexpec expected ted Im Impo portant rtant Co Cont ntent ent

slide-35
SLIDE 35

UW, CSE 454, Dec 8 2009

Ser erendipitous endipitous Enc ncoun

  • unters

ters

slide-36
SLIDE 36

UW, CSE 454, Dec 8 2009

Un Unexpecte ected d Un Unimp mportant tant Co Content

slide-37
SLIDE 37

UW, CSE 454, Dec 8 2009

Un Understa derstand nd Pag age e Dy Dynamics namics

slide-38
SLIDE 38

UW, CSE 454, Dec 8 2009

Att ttend end to to Ac Activity tivity

slide-39
SLIDE 39

UW, CSE 454, Dec 8 2009

Edi dit

slide-40
SLIDE 40

UW, CSE 454, Dec 8 2009

Unexpected Unimportant Content Attend to Activity Edit Understand Page Dynamics Serendipitous Encounter Unexpected Important Content Expected New Content Monitor

Expected Unexpected

slide-41
SLIDE 41

UW, CSE 454, Dec 8 2009

Meth ethods

  • ds fo

for r Stu tudy dying ing Di DiffIE ffIE

 Feedback buttons  Survey

 Prior to installation  After a month of use

 Logging

 URLs visited  Amount of change when revisited

 Experience interview

In situ Representative Experience Longitudinal

slide-42
SLIDE 42

UW, CSE 454, Dec 8 2009

Peo eople ple Re Revisi visit t Mor

  • re

 Perception of revisitation remains constant

 How often do you revisit?  How often are revisits to view new content?

 Actual revisitation increases

 First week: 39.4% of visits are revisits  Last week: 45.0% of visits are revisits

 Why are people revisiting more with DiffIE?

14%

slide-43
SLIDE 43

UW, CSE 454, Dec 8 2009

Re Revisited visited Pag ages es Ch Chan ange ge Mor

  • re

 Perception of change increases

 What proportion of pages change regularly?  How often do you notice unexpected change?

 Amount of change seen increases

 First week: 21.5% revisits changed by 6.2%  Last week: 32.4% revisits changed by 9.5%

 DiffIE is driving visits to changed pages

51+% 17% 8%

slide-44
SLIDE 44

UW, CSE 454, Dec 8 2009

Ch Chang ange e by by Pag age e Ty Type pe

 Perceptions of change reinforced  Pages that change a lot  change more  Pages that change a little  change less

News pages Message boards, forums, news groups Search engine results Blogs you read Pages with product information Wikipedia pages Company homepages Personal home pages of people you know Reference pages (dictionaries, yellow pages, maps) Change a little Change a lot

slide-45
SLIDE 45

UW, CSE 454, Dec 8 2009

Le Levera eraging ging Dy Dyna nami mics cs for

  • r Re

Retrieval eval

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Cont ntent ent Changes nges

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

slide-46
SLIDE 46

Te Tempo mporal ral Re Retrieval rieval Mod

  • dels

els

 Current IR algorithms look only at a single

snapshot of a page

 But, Web pages change over time  Can we can leverage this for improved

retrieval?

 Pages have different rates of change

 Different priors (using change vs. link structure)

 Terms have different longevity (staying power)

 Some are always on the page; some transient

UW, CSE 454, Dec 8 2009

slide-47
SLIDE 47

Te Tempo mporal ral Re Retrieval rieval Mod

  • dels

els

 Page change is related to relevance judgments

 Human relevance judgments  5 point scale – Bad/Fair/Good/Excellent/Perfect  Rate of Change -- 30% Bad pages; 60% Perfect pages

 Use change rate as a document prior (vs. priors

based on links, Page Rank)

UW, CSE 454, Dec 8 2009

slide-48
SLIDE 48

Te Tempo mporal ral Re Retrieval rieval Mod

  • dels

els

 Terms patterns vary over time  Represent a document as a mixture of

terms with different “staying power”

 Long, Medium, Short

 Improves retrieval accuracy

) | ( ) | ( ) | ( ) | (

S S M M L L

D Q P D Q P D Q P D Q P      

UW, CSE 454, Dec 8 2009

slide-49
SLIDE 49

UW, CSE 454, Dec 8 2009

Su Summa mmary

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Cont ntent ent Changes nges

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Relating revisitation and change allows us to

– Identify pages for which change is important – Identify interesting components within a page

People revisit and re-find Web content DiffIE: Support

influences interaction and understanding

Temporal IR:

Leverage change for improved IR

Web content changes: page-level, term-level

slide-50
SLIDE 50

UW, CSE 454, Dec 8 2009

Us User Co Context ext Task/Use k/Use Co Context ext Do Docume ment nt Co Context ext

Query Words Ranked List

Th Think ink Ou Outs tside ide th the (S e (Sea earch) rch) Box

  • x

UMAP, June 24 2009

slide-51
SLIDE 51

UW, CSE 454, Dec 8 2009

Th Thank ank You

  • u !

 Questions/Comments …  More info,

http://research.microsoft.com/~sdumais

slide-52
SLIDE 52

UW, CSE 454, Dec 8 2009

Re References erences

Change and Revisitation:

Adar, Teevan, Dumais & Elsas. The Web changes everything: Understanding the dynamics of Web

  • Content. WSDM’09.

Adar, Teevan & Dumais. Large scale analysis of Web revisitation patterns. CHI'08.

Teevan, Adar, Jones & Potts. Information re-retrieval: Repeat queries in Yahoo’s logs. SIGIR’07.

Tyler & Teevan. Large scale query log analysis of re-finding. WSDM’10.

Adar, Teevan & Dumais. Resonance on the Web: Web dynamics and revisitation patterns. CHI ’09.

DiffIE and Temporal IR:

Teevan, Dumais, Liebling & Hughes. Changing how people view changes on the Web. UIST ’09.

Teevan, Dumais & Liebling. A longitudinal study of how highlighting Web content change affects people’s Web interactions. CHI ’10.

Elsas & Dumais. Leveraging temporal dynamics of document content in relevance ranking. WSDM ’10.