Hermes Clustering Users in Large-Scale E-mail Services Thomas - - PowerPoint PPT Presentation

hermes
SMART_READER_LITE
LIVE PREVIEW

Hermes Clustering Users in Large-Scale E-mail Services Thomas - - PowerPoint PPT Presentation

Hermes Clustering Users in Large-Scale E-mail Services Thomas Karagiannis, Christos Gkantsidis , Dushyanth Narayanan, Antony Rowstron Microsoft Research Cambridge, UK The email social graph


slide-1
SLIDE 1

Hermes

Clustering Users in Large-Scale E-mail Services

Thomas Karagiannis, Christos Gkantsidis, Dushyanth Narayanan, Antony Rowstron

Microsoft Research Cambridge, UK

slide-2
SLIDE 2

The email social graph

  • 2
slide-3
SLIDE 3

The email social graph

3

slide-4
SLIDE 4

System under study

4

slide-5
SLIDE 5

Current allocation of users to servers

5

slide-6
SLIDE 6

Current allocation of users to servers

6

slide-7
SLIDE 7

Better allocation of users to servers

7

slide-8
SLIDE 8

Architecture of email service

8

slide-9
SLIDE 9

Architecture of email service

9

slide-10
SLIDE 10

Architecture of email service

10

slide-11
SLIDE 11

Partitioning

Goal:

  • Identify groups of users
  • …efficiently

Partitioning

Assign users to partitions s.t.

  • min

for edges with endpoints (i.e. users)

  • n different partitions
  • # users per partition is “roughly”

balanced

  • 11
slide-12
SLIDE 12

Evaluation

  • Base performance
  • Scalability:

Can it scale to 100’s millions of users?

  • Capturing changing patterns:

How often should we re-partition?

  • Sensitivity to (# users) / (# servers)

When should we partition?

12

slide-13
SLIDE 13

Benefits of partitioning

13

slide-14
SLIDE 14

Scalability of partitioning

  • 14
slide-15
SLIDE 15

Scalability of partitioning

  • 15
slide-16
SLIDE 16

How often to re-partition?

  • Communication patterns change
  • Computing partitions is an efficient background process
  • However, moving users (ie mailboxes) around is expensive

– 40-70% of user migrations for each re-partition

16

slide-17
SLIDE 17
  • Sensitivity to #users / server
  • 17
slide-18
SLIDE 18

Some other observations

  • Geography

– Easy to incorporate geographical constraints – … very similar results

  • Flexibility in setting the optimization goal

– This work: minimize storage and net – Can also use I/O load

  • Sampling of messages

– This work: collected & used all messages – Also, similar results when ignoring emails with large # recipients – Clever sampling techniques?

18

slide-19
SLIDE 19

Related Work

  • Spar [Pujol et al, SigComm 2010]

– Partitioning for online social networks – Evaluation: Twitter, Facebook, and Orkut traces – Algorithm: Modularity Optimization (MO+)

  • Volley [Agarwal etl al, NSDI 2010]

– Data-Placement for Geo-Distributed Cloud Services – Evaluation: Live Mesh and Live Messenger traces – Algorithm: Use geo-information to place users & data, iteratively improve placement

19

slide-20
SLIDE 20

Summary

  • Goal: Explore social (graph) patterns to

improve online services

– Hermes: Optimize user placement based on email exchanges – 35-50% storage and network savings

  • Partitioning has low overhead:

– No need to do frequent repartitions

20