s search earch box box
play

(S (Search) earch) Box Box Susan Dumais Microsoft Research - PowerPoint PPT Presentation

Thin Th inking king Ou Outs tsid ide e th the e (S (Search) earch) Box Box Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais HCIR: Oct 23, 2008 We Web b In Info fo th thro rough ugh th the Ye e Years


  1. Thin Th inking king Ou Outs tsid ide e th the e (S (Search) earch) Box Box Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais HCIR: Oct 23, 2008

  2. We Web b In Info fo th thro rough ugh th the Ye e Years ars What’s available How it’s accessed  Number of pages indexed  7/94 Lycos – 54,000 pages  95 – 10^6 millions  97 – 10^7  98 – 10^8  01 – 10^9 billions  05 – 10^10 …  Types of content  Web pages, newsgroups  Images, videos, maps  News, blogs, spaces  Shopping, local, desktop  Books, papers, many formats  Health, finance, travel … HCIR: Oct 23, 2008

  3. Su Supporting pporting Se Search archers ers  The search box  Spelling suggestions  Query suggestions  Advanced search operators and options (e.g., “”, +/ -, site:, filetype:, intitle:)  Inline answers  Richer snippets  But, we can do better … understanding context HCIR: Oct 23, 2008

  4. Sea earch Se Search rch an arch To and d Co Toda Cont day ntext ext Us User Context Co ext Query Words Query Words Ranked List Ranked List Do Docume ment nt Co Context ext Task/Use k/Use Context Co ext HCIR: Oct 23, 2008

  5. Sea earch rch an and Co d Cont ntext ext Research prototypes: extend search algorithmic, capabilities, and user experiences  User Contexts:  Finding and Re- Finding (Stuff I’ve Seen)  Novelty in News (NewsJunkie)  Personalized Search (PSearch)  Document/Domain Contexts:  Metadata and search (SIS, Phlat)  Visualizing patterns in results (MemoryLandmarks, GridViz)  Dynamic information environments (DiffIE)  Task/Use Contexts:  Pages as context (Community Bar, IQ)  Richer collections as context (NewsJunkie, PSearch)  Understanding, sharing (SearchTogether, InkSeine) HCIR: Oct 23, 2008

  6. Dumais et al., SIGIR 2003 Stuff I’ve Seen (SIS)  Unified index of stuff you’ve seen  Many types of info (e.g., files, email, calendar, contacts, web pages, rss, im) Stuff I’ve Seen  Index of content and metadata (e.g., time, author, title, size, usage)  Rich UI possibilities  Supports re-finding vs. finding Windo dows DS Vista Desktop Search (and XP, Live Toolbar) Also, Spotlight, GDS, X1, … HCIR: Oct 23, 2008

  7. SIS SI S De Demo HCIR: Oct 23, 2008

  8. SIS IS Us Usage age Experie periences nces Internal deployment ~3000 internal Microsoft users  Analyzed: Free-form feedback, Questionnaires, Structured interviews,  Log analysis (characteristics of interaction), UI expts, Lab expts Susan's (Laptop) World Personal store characteristics Type N Size Web 3k 0.2 Gb  5k – 500k items Files 28k 23.0 GB Mail 60k 2.2 Gb Total 91k items 25.4 Gb Query characteristics Index 190 Mb +1.5 Mb/week Short queries (1.6 words)  Few advanced operators or fielded search in query box (~7%)  Many advanced operators and query iteration in UI (48%)   Filters (type, date, people); modify query; re-sort results HCIR: Oct 23, 2008

  9. SIS Usage Data, cont’d Characteristics of items opened  File types opened  76% Email  14% Web pages  10% Files Log(Freq) = -0.68 * log(DaysSinceSeen) + 2.02  Age of items opened  5% today 120  21% within the last week 100 Frequency 80  47% within the last month 60  50% of the cases -> 36 days 40 20  Web: 11 days 0 0 500 1000 1500 2000 2500  Mail: 36 days Days Since Item First Seen  Files: 55 days HCIR: Oct 23, 2008

  10. SIS Usage Data, cont’d UI Usage  Small effects of: Top/Side, Previews/NoPreviews  Large effect of Sort Order: Number of Queries Issued 30000  Date by far the most common 25000 Date 20000 sort field, even for people who 15000 Rank had best-match Rank as default 10000 Other 5000  Importance of time 0 Date Rank  Few searches for “best” match; Starting Default Sort Order many other criteria … HCIR: Oct 23, 2008

  11. SIS Usage Data, cont’d Observations about unified access  Metadata quality is variable  Email: rich, pretty clean  Web: little (available to application)  Files: some, but often wrong  Memory depends on abstractions  “Useful date” is dependent on the object !  Appointment, when it happens  File, when it is changed  Email and Web, when it is seen  “People” attribute vs. contains  To, From, Cc, Author, Artist HCIR: Oct 23, 2008

  12. Ra Ranked nked list st vs. . Met etadata adata (fo for r pe person onal al con onte tent) nt) Why Rich Metadata? • People remember many attributes in re-finding - Often: time, people, file type, etc. - Seldom: only general overall topic • Rich client-side interface - Support fast iteration/refinement - Fast filter-sort-scroll vs. next-next-next HCIR: Oct 23, 2008

  13. Teevan et al., SIGIR 2007 Re Re-find finding ing on on th the Web e Web  50-80% page visits are re-visits  30-40% of queries are re-finding queries HCIR: Oct 23, 2008

  14. Demo Cutrell et al., CHI 2006 Ph Phlat: lat: Se Sear arch h an and Met d Metad adat ata  Phlat ( Prototype for Helpful Lookup And Tagging)  Shell for WDS; Publically available  Tightly couples search and metatdata  Features:  Search / Browse (metadata)  Unified Tagging  In-Context Search HCIR: Oct 23, 2008

  15. Phl hlat: at: Fa Faceted eted met etadata adata (for r filter terin ing, g, sorting ing, , querying ing, , tagging ng)  Tight coupling of search and browsing  Q  Results &  Associated metadata w/ query previews  5 default properties to filter on (extensible)  Includes tags  Property filters integrated with query  Query = words and/or properties  No stuck filters  Search == Browse HCIR: Oct 23, 2008

  16. Phl hlat: at: Ta Taggi gging ng  Apply a single set of user-generated tags to all content (e.g., files, email, web, rss, etc.)  Tagging interaction  Tag widget or drag-to-tag  Tag structure  Allow but do not require hierarchy  Tag implementation  Tags directly associated with files as NTFS or MAPI properties HCIR: Oct 23, 2008

  17. Pha hat: t: In In-Co Context ntext Sea earch rch  Selecting a result …  Linked view to show associated tags  Rich actions  Open, drag-drop, etc.  “Sideways search”  Pivot on metadata  Refine or replace query HCIR: Oct 23, 2008

  18. Phl hlat at Phlat shell for Windows Desktop Search • Tight coupling of searching/browsing • Rich faceted metadata support Including unified tagging across data types • In-context search and actions Down Do wnloa load: : http:// p://rese research. arch.mic microsoft. rosoft.com/ada com/adapt pt/ph /phla lat HCIR: Oct 23, 2008

  19. Meta etadata data an and the d the We Web  Many queries contain implicit metadata  thomas edison image portrait  latest lasik techniques, canada  good nursing programs in baltimore  cheap digital camera  overview of active directory domains  …  Limited support for users to articulate this HCIR: Oct 23, 2008

  20. Adar et al., CHI 2008 & WSDM 2009 Dy Dynam namic ic In Info fo En Environments ironments MSR Homepage 1996 2007 HCIR: Oct 23, 2008

  21. Dy Dynamic namic In Info fo Env nvironments ironments Content Changes 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 User Visitation/ReVisitation Today’s Browse and Search Experiences But, ignores … HCIR: Oct 23, 2008

  22. Wh What at We We Di Did  Content:  Crawled 55k pages every hour for 1 year  Varying #users, #visits/user, inter-visit interval  Behavior:  Analyzed revisitation patterns for >600k users for these 55k pages  Surveyed 20 people for richer understanding of intent  Examined:  User revisitation patterns  Page change patterns  Relations between change and revisitation HCIR: Oct 23, 2008

  23. Wh What at We We Fo Foun und Revisita isitation tion patter erns ns  Revisitations to pages are very common  50-80% of pages  What makes one page’s revisits different from another?  Examined four Intent characteristics Change Content Session HCIR: Oct 23, 2008

  24. What Wh at We We Fo Foun und Change e patter erns ns  66% of the pages change  Change every 123 hours (avg.)  Change by 0.21 (avg. dice coeff.)  Which pages change?  Popular pages, .com pages change most  Which terms change?  Term longevity analyses HCIR: Oct 23, 2008

  25. Wh What at We We Fo Foun und Cha hang nge e pa patt tter erns ns 2007 1998 HCIR: Oct 23, 2008

  26. Wh What at We We Fo Foun und Cha hang nge e pa patt tter erns ns – rat ate e of of cha hang nge HCIR: Oct 23, 2008

  27. Wh What at We We Fo Foun und Cha hang nge e pa patt tter erns ns – fo for you our visits its Diff-IE IE HCIR: Oct 23, 2008

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend