dyn ynam amic ic in info form rmat atio ion n env nvir
play

Dyn ynam amic ic In Info form rmat atio ion n Env nvir - PowerPoint PPT Presentation

Dyn ynam amic ic In Info form rmat atio ion n Env nvir ironm onmen ents ts Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais In collaboration with: Jaime Teevan, Eytan Adar, Jon Elsas, Dan Liebling, Richard


  1. Dyn ynam amic ic In Info form rmat atio ion n Env nvir ironm onmen ents ts Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais In collaboration with: Jaime Teevan, Eytan Adar, Jon Elsas, Dan Liebling, Richard Hughes UW, CSE 454, Dec 8 2009

  2. Ou Outl tline ine  Web search and context  Temporal dynamics of information  Characterizing change  Content changes over time  People re-visit and re-find  Relationships between content change and re-access  Improving retrieval and understanding  Building support for understanding change (e.g., DiffIE)  Leveraging dynamics for improved retrieval UW, CSE 454, Dec 8 2009

  3. Web We b Sea earch rch at at 15 15 How it’s accessed What’s available  Number of pages indexed  7/94 Lycos – 54,000 pages  95 – 10^6 millions  97 – 10^7  98 – 10^8  01 – 10^9 billions  05 – 10^10 …  Types of content  Web pages, newsgroups  Images, videos, maps  News, blogs, spaces  Shopping, local, desktop  Books, papers, many formats  Health, finance, travel … UW, CSE 454, Dec 8 2009

  4. Sup upport port fo for r Sea earc rchers hers  The search box  Spelling suggestions  Query suggestions  Auto complete  Inline answers  Richer snippets  But, we can do better … by understanding context UW, CSE 454, Dec 8 2009

  5. Search and Context Search Today User Context Query Words Query Words Ranked List Ranked List Document Context Task/Use Context UW, CSE 454, Dec 8 2009

  6. Inter-Relationships among Documents Categorization and Metadata Reuters, spam, landmarks, web categories … Systems/Prototypes Domain-specific features, time • New capabilities and experiences Interfaces and Interaction • Algorithms and prototypes Stuff I’ve Seen, Phlat, Timelines, SWISH • Deploy, evaluate and iterate Tight coupling of browsing and search Redundancy Temporal Dynamics Modeling Users Short vs. long term Individual vs. group Implicit vs. explicit Evaluation Using User Models • Many methods, scales • Individual components Stuff I’ve Seen (re -finding) Personalized Search and their combinations News Junkie (novelty) User Behavior in Ranking Domain Expertise at Web-scale UW, CSE 454, Dec 8 2009

  7. Information In formation Dy Dynamics namics Cont ntent ent Changes nges 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 User er Vis isita itatio ion/ n/Re ReVisit Visitat ation ion Today’s Browse and Search Experiences But, ignores … UW, CSE 454, Dec 8 2009

  8. Di Digi gital tal Dy Dyna nami mics cs Ea Easy to Cap o Captur ure  Easy to capture  Few tools support dynamics UW, CSE 454, Dec 8 2009

  9. Inf nformation ormation Dy Dynam namics ics  Characterizing change  Content changes over time  People re-visit and re-find  Relationships between content change and re-access  Improving retrieval and understanding  Building support for understanding change (e.g., DiffIE)  Leveraging dynamics for improved retrieval UW, CSE 454, Dec 8 2009

  10. Ch Characterizi aracterizing ng Ch Chan ange ge Cont ntent ent Changes nges 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009  Large- scale Web crawls, over time  Revisited pages 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009  55,000 pages crawled hourly for 18+ months  Unique users, visits/user, time between visits User er Vis  Judged pages (Relevance to a query) isita itatio ion/ n/Re ReVisit Visitat ation ion  6 million pages crawled every two days for 6 months UW, CSE 454, Dec 8 2009

  11. Meas easuring uring We Web b Pag age e Ch Chan ange ge 1  Summary metrics  Number of changes 0.8  Time between changes Dice Similarity  Amount of change 0.6  Change curves 0.4  Fixed starting point Knot point  Measure similarity over 0.2 different time intervals 0 Time e from starti rting ng point UW, CSE 454, Dec 8 2009

  12. Measuring easuring Wi Within thin-Page Page Ch Chan ange ge  DOM structure changes  Term use changes  Divergence from norm  cookbooks  salads  cheese  ingredient  bbq  “Staying power” in page Sep. Oct. Nov. Dec. Time UW, CSE 454, Dec 8 2009

  13. Ex Exam ampl ple Te Term Lon m Longe gevity ity Gr Grap aphs hs UW, CSE 454, Dec 8 2009

  14. Revisitation Re visitation on on th the Web e Web  Revisitation patterns Cont ntent ent Changes nges  Log analyses  Toolbar logs for revisitation  Query logs for re-finding 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009  User survey to understand intent in revisitations 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Users ers Revisit isit What’s the last Web page you visited? UW, CSE 454, Dec 8 2009

  15. Meas easuring uring Re Revisitation visitation 1  Summary metrics  Unique visitors 0.8  Visits/user Normalized Count  Time between visits 0.6  Revisitation curves 0.4  Histogram of revisit intervals 0.2  Normalized 0 Time Interval UW, CSE 454, Dec 8 2009

  16. Fo Four ur Re Revisitation isitation Pat atterns terns  Fast  Hub-and-spoke  Navigation within site  Hybrid  High quality fast pages  Medium  Popular homepages  Mail and Web applications  Slow  Entry pages, bank pages  Accessed via search engine UW, CSE 454, Dec 8 2009

  17. Sea earch rch an and d Re Revis visitation itation  Repeat query (33%) Repeat New  microsoft research Click Click Repeat Repeat  Repeat click (39%) 33% 33% 29% 4% Query Query  http://research.microsoft.com New New 67% 67% 10% 57%  Q: microsoft research, msr … Query Query  Big opportunity (43%) 39% 61%  24% “navigational revisits” UW, CSE 454, Dec 8 2009

  18. Re Repeat t Cl Clicks ks for Re Repeat Queries  Within session: Repeat query -> New click  Across sessions: Repeat query -> Repeat click UW, CSE 454, Dec 8 2009

  19. Re Relat ations ionships hips Be Betwe ween en Re Revisitatio isitation an and Cha d Chang nge Cont ntent ent Changes nges 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Why did you revisit the last Web page you revisited? UW, CSE 454, Dec 8 2009

  20. Pos ossible sible Re Relationships ationships  Interested in change  Monitor  Effect change  Transact  Change unimportant  Find new  Change can interfere  Re-find UW, CSE 454, Dec 8 2009

  21. Un Understa derstanding nding the he Re Relationship ationship  Compare summary metrics  Revisits: Unique visitors, visits/user, interval  Change: Number, interval, Dice Number er of Time e between en Dice coefficie ficient nt changes ges changes ges 2 visits/user 2 visits/user 172.91 91 133.26 26 0.82 3 visits/user 3 visits/user 200.51 51 119.24 24 0.82 4 visits/user 4 visits/user 234.32 32 109.59 59 0.81 5 or 6 visits/user 5 or 6 visits/user 269.63 63 94.54 0.82 7+ visits/user 7+ visits/user 341.43 43 81.80 0.81 UW, CSE 454, Dec 8 2009

  22. Compa mparing ring Change nge and Revisi isit t Curves ves  Three pages NYT NYT NYT NYT Woot Woot Woot Woot Costco Costco Costco Costco 1.2 1.2 1.2 1.2  New York Times  Woot.com 1 1 1 1  Costco 0.8 0.8 0.8 0.8  Similar change patterns 0.6 0.6 0.6 0.6  Different revisitation 0.4 0.4 0.4 0.4  NYT: Fast (news, forums)  Woot: Medium 0.2 0.2 0.2 0.2  Costco: Slow (retail) 0 0 0 0 Time UW, CSE 454, Dec 8 2009

  23. Wi Within thin-Page Page Re Relationship ationship  Page elements change at different rates  Pages revisited at different rates • “Resonance” can serve as a filter for interesting content UW, CSE 454, Dec 8 2009

  24. UW, CSE 454, Dec 8 2009

  25. UW, CSE 454, Dec 8 2009

  26. UW, CSE 454, Dec 8 2009

  27. Dy Dynamics namics of of In Information ormation  Characterizing change  Content changes over time  People re-visit and re-find  Relationships between content change and re-access  Improving retrieval and understanding  Building support for understanding change (e.g., DiffIE)  Leveraging dynamics for improved retrieval UW, CSE 454, Dec 8 2009

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend