Exploiting social networks for Internet search Alan Mislove Krishna - - PowerPoint PPT Presentation

exploiting social networks for internet search
SMART_READER_LITE
LIVE PREVIEW

Exploiting social networks for Internet search Alan Mislove Krishna - - PowerPoint PPT Presentation

Exploiting social networks for Internet search Alan Mislove Krishna Gummadi Peter Druschel Max Planck Institute for Software Systems Rice University HotNets 2006 Search in the Internet Web has transformed information


slide-1
SLIDE 1

Exploiting social networks for Internet search

Alan Mislove†‡ Krishna Gummadi† Peter Druschel†

†Max Planck Institute for Software Systems

‡Rice University

HotNets 2006

slide-2
SLIDE 2

30.11.2006 HotNets’06 Alan Mislove

Search in the Internet

  • Web has transformed information exchange
  • Social networking is now a popular way to share content
  • Photos, videos, blogs, music and profiles
  • MySpace (100 M users), Orkut (30 M users), ...
  • Many studies examined Web: Web search well understood
  • Few looked at social networks

2

slide-3
SLIDE 3

30.11.2006 HotNets’06 Alan Mislove

This talk

  • Compares content sharing in the Web and social networks
  • Shows underlying mechanisms for publishing and locating differ
  • Examines implications for locating various types of content
  • Investigates benefit of using social network search over Web

3

slide-4
SLIDE 4

30.11.2006 HotNets’06 Alan Mislove

Web vs. social networks: Publishing

4

  • In Web, links exist between content
  • Hyperlink is endorsement of relevance
  • In social networks, no links between content
  • Links between users and content they create or endorse
  • Links between users with common interests or trust
  • Different link structures affect how content is located
slide-5
SLIDE 5

30.11.2006 HotNets’06 Alan Mislove

Web vs. social networks: Publishing

4

  • In Web, links exist between content
  • Hyperlink is endorsement of relevance
  • In social networks, no links between content
  • Links between users and content they create or endorse
  • Links between users with common interests or trust
  • Different link structures affect how content is located
slide-6
SLIDE 6

30.11.2006 HotNets’06 Alan Mislove

Web vs. social networks: Publishing

4

  • In Web, links exist between content
  • Hyperlink is endorsement of relevance
  • In social networks, no links between content
  • Links between users and content they create or endorse
  • Links between users with common interests or trust
  • Different link structures affect how content is located
slide-7
SLIDE 7

30.11.2006 HotNets’06 Alan Mislove

Web vs. social networks: Publishing

4

  • In Web, links exist between content
  • Hyperlink is endorsement of relevance
  • In social networks, no links between content
  • Links between users and content they create or endorse
  • Links between users with common interests or trust
  • Different link structures affect how content is located
slide-8
SLIDE 8

30.11.2006 HotNets’06 Alan Mislove

Web vs. social networks: Locating

5

  • Web search exploits hyperlink

structure

  • More incoming links imply

importance

  • Social networks use user

feedback

  • Implicit (e.g. # of views)
  • Explicit (e.g. rating,

# of comments, favorites)

slide-9
SLIDE 9

30.11.2006 HotNets’06 Alan Mislove

Web vs. social networks: Locating

5

  • Web search exploits hyperlink

structure

  • More incoming links imply

importance

  • Social networks use user

feedback

  • Implicit (e.g. # of views)
  • Explicit (e.g. rating,

# of comments, favorites)

slide-10
SLIDE 10

30.11.2006 HotNets’06 Alan Mislove

What content do social nets locate better?

  • Recently added content
  • Creating Web links takes time, social nets rapidly rate content
  • Information of interest to a specific community
  • Web ratings reflect interests of community at large
  • Web search misses deep web content
  • Multimedia content
  • Hard to link content instances
  • Social network uses tags and comments
  • Can this Web content be better located with social networks?

6

slide-11
SLIDE 11

30.11.2006 HotNets’06 Alan Mislove

What content do social nets locate better?

  • Recently added content
  • Creating Web links takes time, social nets rapidly rate content
  • Information of interest to a specific community
  • Web ratings reflect interests of community at large
  • Web search misses deep web content
  • Multimedia content
  • Hard to link content instances
  • Social network uses tags and comments
  • Can this Web content be better located with social networks?

6

slide-12
SLIDE 12

30.11.2006 HotNets’06 Alan Mislove

Applying social network search to Web

7

  • PeerSpective experiment uses social nets to search the Web
  • High level idea: users can query their friends’ viewed pages
  • Results from friends appear alongside Google results
slide-13
SLIDE 13

30.11.2006 HotNets’06 Alan Mislove

Applying social network search to Web

7

PeerSpective Google

  • PeerSpective experiment uses social nets to search the Web
  • High level idea: users can query their friends’ viewed pages
  • Results from friends appear alongside Google results
slide-14
SLIDE 14

30.11.2006 HotNets’06 Alan Mislove

PeerSpective implementation

  • Prototype is a lightweight HTTP proxy
  • Runs on users’ desktop and indexes all browsed content
  • When Google search is performed
  • Query other PeerSpective proxies in parallel with Google
  • Present results alongside each other

8

slide-15
SLIDE 15

30.11.2006 HotNets’06 Alan Mislove

PeerSpective implementation

  • Prototype is a lightweight HTTP proxy
  • Runs on users’ desktop and indexes all browsed content
  • When Google search is performed
  • Query other PeerSpective proxies in parallel with Google
  • Present results alongside each other

8

PeerSpective PeerSpective PeerSpective

slide-16
SLIDE 16

30.11.2006 HotNets’06 Alan Mislove

PeerSpective implementation

  • Prototype is a lightweight HTTP proxy
  • Runs on users’ desktop and indexes all browsed content
  • When Google search is performed
  • Query other PeerSpective proxies in parallel with Google
  • Present results alongside each other

8

PeerSpective PeerSpective PeerSpective

slide-17
SLIDE 17

30.11.2006 HotNets’06 Alan Mislove

Questions to answer

  • Does PeerSpective improve coverage?
  • What is the coverage of Google’s index for viewed pages?
  • What fraction of URLs already viewed by a friend?
  • How good is PeerSpective at ranking results?
  • Do users click on PeerSpective or Google results?

9

slide-18
SLIDE 18

30.11.2006 HotNets’06 Alan Mislove

High-level results

  • Ran PeerSpective with 10 users for one month
  • All users were researchers at MPI
  • 51,410 distinct URLs viewed
  • 1,730 Google searches
  • Caveat: Small data set from group of computer scientists
  • User group includes authors
  • Results indicate potential, at least for special interest groups

10

slide-19
SLIDE 19

30.11.2006 HotNets’06 Alan Mislove

What fraction of viewed URLs does Google index?

  • Limited to static pages (text/html ending in .html or .htm)
  • Queried Google’s index for each URL
  • Using about:URL search request
  • Google contained only 62.5% of URLs!
  • Representing 68.1% of HTTP requests

11

...

slide-20
SLIDE 20

30.11.2006 HotNets’06 Alan Mislove

What fraction of viewed URLs does Google index?

  • Limited to static pages (text/html ending in .html or .htm)
  • Queried Google’s index for each URL
  • Using about:URL search request
  • Google contained only 62.5% of URLs!
  • Representing 68.1% of HTTP requests

11

...

slide-21
SLIDE 21

30.11.2006 HotNets’06 Alan Mislove

Why are so many URLs not in Google?

  • Examined URL list, found three reasons
  • Too new: Google has not had time to crawl this URL
  • Deep web: URL is not well-connected enough to crawl
  • Dark web: URL is not connected, or not visible

12

http://edition.cnn.com/2006/ ... /italy.nesta/index.html http://www.mpi-sws.mpg.de/~pkouznet/ ... /pres0031.ht/pres0031.html http://www.mpi-sws.org/intranet/index.htm

slide-22
SLIDE 22

30.11.2006 HotNets’06 Alan Mislove

What fraction of URLs viewed by a friend?

  • Only static, text/html pages
  • Same methodology as Google coverage check
  • 30.4% of URLs previously viewed by someone in network
  • Many previously viewed locally
  • 13.3% of URLs previous viewed but not in Google!
  • Suggests social networks can extend index coverage
  • With comparatively small index

13

slide-23
SLIDE 23

30.11.2006 HotNets’06 Alan Mislove

Did users click on PeerSpective results?

14

  • For each result click, we ask
  • Only in Google’s top-10?
  • Only in PeerSpective’s top-10?
  • In top-10 from both?
  • 7.7% of result clicks were on PeerSpective-only results!
  • Shows potential of social network search
slide-24
SLIDE 24

30.11.2006 HotNets’06 Alan Mislove

Did users click on PeerSpective results?

14

  • For each result click, we ask
  • Only in Google’s top-10?
  • Only in PeerSpective’s top-10?
  • In top-10 from both?
  • 7.7% of result clicks were on PeerSpective-only results!
  • Shows potential of social network search
slide-25
SLIDE 25

30.11.2006 HotNets’06 Alan Mislove

Did users click on PeerSpective results?

14

  • For each result click, we ask
  • Only in Google’s top-10?
  • Only in PeerSpective’s top-10?
  • In top-10 from both?
  • 7.7% of result clicks were on PeerSpective-only results!
  • Shows potential of social network search
slide-26
SLIDE 26

30.11.2006 HotNets’06 Alan Mislove

Did users click on PeerSpective results?

14

  • For each result click, we ask
  • Only in Google’s top-10?
  • Only in PeerSpective’s top-10?
  • In top-10 from both?
  • 7.7% of result clicks were on PeerSpective-only results!
  • Shows potential of social network search
slide-27
SLIDE 27

30.11.2006 HotNets’06 Alan Mislove

  • Disambiguation: determining appropriate meaning of term
  • Search engines currently pick most popular definition
  • PeerSpective can leverage meaning relevant to friends

Why are PeerSpective-only URLs clicked on?

15

MPI ?

Message Passing Interface Max Planck Institute Meeting Professionals International Manitoba Public Insurance

slide-28
SLIDE 28

30.11.2006 HotNets’06 Alan Mislove

  • Disambiguation: determining appropriate meaning of term
  • Search engines currently pick most popular definition
  • PeerSpective can leverage meaning relevant to friends

Why are PeerSpective-only URLs clicked on?

15

MPI ?

Message Passing Interface Max Planck Institute Meeting Professionals International Manitoba Public Insurance Max Planck Institute

slide-29
SLIDE 29

30.11.2006 HotNets’06 Alan Mislove

  • Relevance: picking best among matching documents
  • Example: search for ‘coolstreaming’ leads to paper
  • PeerSpective can use shared interests of friends

16

Why are PeerSpective-only URLs clicked on?

slide-30
SLIDE 30

30.11.2006 HotNets’06 Alan Mislove

  • Serendipity: finding interesting and unexpected content
  • Integral to web search experience
  • News sites are current examples of serendipitous sites
  • Example: ‘Munich’ leads to co-worker’s homepage
  • Serendipitous discoveries occur frequently in PeerSpective
  • Users often find pages viewed by friends interesting

17

Why are PeerSpective-only URLs clicked on?

slide-31
SLIDE 31

30.11.2006 HotNets’06 Alan Mislove

Results summary

  • PeerSpective explored potential of integrating Web and social

network search

  • Found that PeerSpective aided web search
  • Provided additional coverage for viewed sites
  • Improved ranking of results
  • Aided finding serendipitous content
  • Changed usage pattern of our users
  • However, just an experiment
  • Many challenges and opportunities to actual system

18

slide-32
SLIDE 32

30.11.2006 HotNets’06 Alan Mislove

Opportunities and challenges

  • Privacy
  • Users disclose someone in their group has viewed a URL
  • Subject to k-anonymity
  • In PeerSpective, currently
  • No HTTPS indexed
  • Allowed users to turn off indexing and purge pages
  • Search queries not recorded
  • Need ways to ensure anonymity and privacy
  • While providing incentives to contribute

19

slide-33
SLIDE 33

30.11.2006 HotNets’06 Alan Mislove

  • Clustering
  • Users often members of multiple

social groups

  • Necessary to route query to

most useful users?

  • Architecture
  • Centralized vs. decentralized?
  • Rather share URL history with centralized organization or friends?
  • Others in the paper

Opportunities and challenges

20

Friends Family Work

slide-34
SLIDE 34

30.11.2006 HotNets’06 Alan Mislove

Conclusion

  • Content sharing mechanisms in Web and social nets differ widely
  • Social nets are naturally better suited for certain content
  • Early experiments suggest social nets can improve Web search
  • Found noticeable improvement in coverage and ranking
  • Will soon release PeerSpective to the PlanetLab community

21

slide-35
SLIDE 35

30.11.2006 HotNets’06 Alan Mislove

Questions?

22

slide-36
SLIDE 36

30.11.2006 HotNets’06 Alan Mislove

What is the coverage of Google/PS?

23

Yes No Yes No

In PeerSpective? In Google?

16.7% 45.8% 13.3% 24.2%

slide-37
SLIDE 37

30.11.2006 HotNets’06 Alan Mislove

What is the coverage of Google/PS?

23

Yes No Yes No

In PeerSpective? In Google?

62.5% 16.7% 45.8% 13.3% 24.2%

slide-38
SLIDE 38

30.11.2006 HotNets’06 Alan Mislove

What is the coverage of Google/PS?

23

Yes No Yes No

In PeerSpective? In Google?

62.5% 30.4% 16.7% 45.8% 13.3% 24.2%

slide-39
SLIDE 39

30.11.2006 HotNets’06 Alan Mislove

What is the coverage of Google/PS?

23

Yes No Yes No

In PeerSpective? In Google?

62.5% 30.4% 16.7% 45.8% 13.3% 24.2%

slide-40
SLIDE 40

30.11.2006 HotNets’06 Alan Mislove

What results do users click on?

24

Yes No Yes No

  • PeerSpective result?

Google result?

86.5% 5.8% 7.7%

slide-41
SLIDE 41

30.11.2006 HotNets’06 Alan Mislove

What results do users click on?

24

Yes No Yes No

  • PeerSpective result?

Google result?

92.3% 86.5% 5.8% 7.7%

slide-42
SLIDE 42

30.11.2006 HotNets’06 Alan Mislove

What results do users click on?

24

Yes No Yes No

  • PeerSpective result?

Google result?

92.3% 86.5% 5.8% 7.7% 13.5%

slide-43
SLIDE 43

30.11.2006 HotNets’06 Alan Mislove

What results do users click on?

24

Yes No Yes No

  • PeerSpective result?

Google result?

92.3% 86.5% 5.8% 7.7% 13.5%