20 years of Web search where to next? Mark Sanderson Who am I? - - PowerPoint PPT Presentation

20 years of web search where to next
SMART_READER_LITE
LIVE PREVIEW

20 years of Web search where to next? Mark Sanderson Who am I? - - PowerPoint PPT Presentation

20 years of Web search where to next? Mark Sanderson Who am I? Professor at RMIT University, Melbourne Before Professor at University of Sheffield Researcher at UMass Amherst Researcher at University of Glasgow Online


slide-1
SLIDE 1

20 years of Web search – where to next?

Mark Sanderson

slide-2
SLIDE 2

2

Who am I?

  • Professor at RMIT University, Melbourne
  • Before

–Professor at University of Sheffield –Researcher at UMass Amherst –Researcher at University of Glasgow

  • Online

–@IR_oldie –http://www.seg.rmit.edu.au/mark/

slide-3
SLIDE 3

Overview of talk

  • A bit of history
slide-4
SLIDE 4

A bit of history

Early IR

slide-5
SLIDE 5

5

Before IR systems

  • There were libraries

–The search engine of the day

  • Organise information using

a subject catalogue

–Sort cards by author –Sort cards by title –Sort cards by subject

–How to do this?

slide-6
SLIDE 6

6

Not just public libraries

  • MIT Masters thesis, Philip Bagley, 1951
slide-7
SLIDE 7

At the same time…

  • While librarians were coping with the

information explosion

–Could machines help? –Could computers help?

  • Very brief history of machines and

computers for search

7

slide-8
SLIDE 8

Machines doing IR

CS&IT - ISAR 8

slide-9
SLIDE 9

As we may think – Bush 1945

9

–http://www.youtube.com/watch?v=c539cK58ees

slide-10
SLIDE 10

10

Computers doing IR

  • Holmstrom 1948
slide-11
SLIDE 11

11

Information Retrieval

  • Calvin Mooers, 1950
slide-12
SLIDE 12

NRT

12

  • See demo shown in talk at

– http://www.seg.rmit.edu.au/mark/demos/NRT/NRT%20demo.htm

  • Paper at

– http://www.seg.rmit.edu.au/mark/cv/publications/papers/my_papers/EP-odd.pdf

slide-13
SLIDE 13

The web arrived

  • 1993

–JumpStation

–Jonathon Fletcher, University of Stirling

  • Steinberg, Wired, 1996

–“Information retrieval is really only a problem for people in library science - if some computer scientists were to put their heads together, they'd probably have it solved before lunchtime.”

slide-14
SLIDE 14

Where are we now

Google/Bing

slide-15
SLIDE 15

Where we are now

  • Google/Bing

–Text matching

–Fields, anchor –PageRank –Query logs –…

–Massive machine learning

–Evaluation –Continual tuning

slide-16
SLIDE 16

Search is solved?

  • Common perception

16

slide-17
SLIDE 17

Favourable conditions

  • Most content wants to be found
  • Most content is redundant
  • Huge income
  • Queries often repeated
  • Users can read & write

17

slide-18
SLIDE 18

Where to next?

  • Immediate problems
  • Immediate opportunities
  • Medium term challenges
  • Longer term challenges
slide-19
SLIDE 19

Immediate

Problems/opportunies

slide-20
SLIDE 20

Problematic summaries

20

slide-21
SLIDE 21

Less favourable?

  • People struggle to search
  • People miss retrieved documents

–Fine for redundant content; what if just one?

21

slide-22
SLIDE 22

Problem searching

  • Limited redundancy

–Little money

–Enterprise search –Refinding

–Content doesn’t want to be found

–Patent search –Legal document search (e-Discovery)

22

slide-23
SLIDE 23

Enterprise search

  • Many problems in this space
  • Each collection is different

–Each search engine needs to be different

  • No money
  • “Why doesn’t it work like Google?”

23

slide-24
SLIDE 24

Significant problem

  • Think carefully before

including search in your user interface

24

slide-25
SLIDE 25

At RMIT

  • Trying to scope the problem

–If we find a search solution that works on one set of documents, does it work on others? –Not as much as was thought

–A lot worse than was thought

25

slide-26
SLIDE 26

Major immediate challenge

  • Do search as well as Google no matter

what the collection, and do it without all their money

26

slide-27
SLIDE 27

Favourable conditions

  • Most content wants to be found
  • Most content is redundant
  • Huge income
  • Queries often repeated
  • Users can read & write

27

slide-28
SLIDE 28

Refinding

  • Interviewed 45 searchers about common

retrieval tasks

–70% relate to refinding

  • Starting funded investigation in this area.

28

slide-29
SLIDE 29

Ephemeral & archival content

  • Archival

–Traditional web search

–Web pages, news, documents –Coarse grained

  • Ephemeral

–Social media

–Blogs, social networks, micro-blogs –Fine grained

29

slide-30
SLIDE 30

Interface of the two

  • Summarising ephemeral content

–Only just starting –Lots of opportunities to specialise

  • How can ephemeral content aid search of

archival

–RMIT changing representation of archival content based on ephemeral data.

–Early days, but promising

30

slide-31
SLIDE 31

Medium term

slide-32
SLIDE 32

Diffuse information

slide-33
SLIDE 33

Harder information needs

  • Entertain me
  • Contextual

search

  • SWIRL 2012

–http://www.cs.r mit.edu.au/swirl 12/

33

slide-34
SLIDE 34

Longer term

slide-35
SLIDE 35

Longer term

  • Long queries
  • Spoken search
  • The internet for everyone
slide-36
SLIDE 36

Users have complex needs

  • Poorly expressed in short queries

–Experts

–issue multiple short queries –use search engine operators

  • Can we build search engines to handle

complex queries?

36

slide-37
SLIDE 37

New application area?

  • Speech search

–Hand free –Eyes free

  • Seen in the movies, but really?

37

slide-38
SLIDE 38

Users?

  • Visually impaired

–Together they could form a country

  • Other potential uses

–In car searching –Walking in a city

38

slide-39
SLIDE 39

Internet for everyone

– http://www.onbile.com/info/how-many-people-use-smartphones-in-the-world/

39

slide-40
SLIDE 40

Internet users?

  • 2013

–2 billion now

  • 2015

–4 billion mostly on mobiles (Baird Equity Research)

40

slide-41
SLIDE 41

Implications?

  • More languages
  • More users who struggle with literacy

–Search engines assume you can read and write

41

slide-42
SLIDE 42

Search engines

There is a lot still to do