Poor Man's Social Network Consistently Trade Freshness For - - PowerPoint PPT Presentation

poor man s social network
SMART_READER_LITE
LIVE PREVIEW

Poor Man's Social Network Consistently Trade Freshness For - - PowerPoint PPT Presentation

Poor Man's Social Network Consistently Trade Freshness For Scalability Zhiwu Xie, Jinyang Liu, Herbert Van de Sompel, Johann van Reenen and Ramiro Jordan Outline Scaling feed following Algorithm Experiment and results Conclusions


slide-1
SLIDE 1

Poor Man's Social Network

Consistently Trade Freshness For Scalability

Zhiwu Xie, Jinyang Liu, Herbert Van de Sompel, Johann van Reenen and Ramiro Jordan

slide-2
SLIDE 2

2

Outline

  • Scaling feed following
  • Algorithm
  • Experiment and results
  • Conclusions
slide-3
SLIDE 3

B F A C H J I D K G E

Feed Following

3

blah blah blah consumer producer consumer producer blah

Feed Following:

blah blah blah blah

slide-4
SLIDE 4

Feed Following Scalability

Give me the 20 most recent tweets sent by all the people I follow

  • Individualized queries
  • Fast changing global state
  • Partitioning, replication, and caching
  • NoSQL: trade consistency for scalability

4

slide-5
SLIDE 5

Consistency

  • Atomicity, Linearizability, or One-copy

Serializability (1SR)

5

blah blah blah blah

Feed Following:

blah blah blah blah

Feed Following:

blah blah blah blah

Time

slide-6
SLIDE 6

Retweet Anomaly

6

B A C

blah

Feed Following:

blah Retweet: blah

Feed Following:

Retweet: blah

slide-7
SLIDE 7

New Approach: TimeMap Query

Who have created new tweets during the past scheduled release periods?

  • Global time across partitions
  • Schedule releasing
  • Client-side processing and caching
  • Consistently trade freshness for scalability

7

slide-8
SLIDE 8

CAP Theorem

  • Preconditioned on the asynchronous network model:

the only way to coordinate the distributed nodes is to pass messages

  • In the partially synchronous model, where global time

is assumed to be available, CAP may indeed be simultaneously achievable most of the time

8

slide-9
SLIDE 9

Global Time

  • “One of the mysteries of the universe is that it is

possible to construct a system of physical clocks which, running quite independently of one another, will satisfy the Strong Clock Condition.” – Time, Clocks and the Ordering of Events in a Distributed System, by Leslie Lamport

9

slide-10
SLIDE 10

Scheduled Release Algorithm

10

Who have created new tweets during the past scheduled release periods?

slide-11
SLIDE 11

Partitioning: Send A New Tweet

11

1 2 3 4 User_id: 0, 5, 10, 15,… User_id: 1, 6, 11, 16,… User_id: 2, 7, 12, 17,… User_id: 3, 8, 13, 18,… User_id: 4 9, 14, 19,…

slide-12
SLIDE 12

Partitioning: TimeMap

12

1 2 3 N-1 …… ……

slide-13
SLIDE 13

Client Side Processing

13

A

If the current time is 1:05:37PM, please tell me who (no matter if I follow any of them or not) have sent new tweets from 1:05:30PM to 1:05:35PM. I’ll figure out by myself if any of these new tweets are relevant to me, and if so, I’ll retrieve these tweets separately by myself.

B

If the current time is 1:05:39PM, please tell me who (no matter if I follow any of them or not) have sent new tweets from 1:05:30PM to 1:05:35PM. I’ll figure out by myself if any of these new tweets are relevant to me, and if so, I’ll retrieve these tweets separately by myself.

Cache!

slide-14
SLIDE 14

Staleness vs. Latency

14

Time

How are you? I’m fine (as of 2:00)

1:00 2:00

Time

How were you at 12:55? I was fine (as of 12:55)

1:00 1:05

Fresh, but 1 hour latency 10 minutes stale but only 5 minutes latency

slide-15
SLIDE 15

Trade Freshness For Scalability

  • Mass transit system vs. private car
  • Lose flexibility, but gain overall

efficiency by sharing resources

  • Stale up to the length of the schedule

release period, e.g., 5 seconds.

15

slide-16
SLIDE 16

Experiment

  • Implemented on AWS
  • A Twitter like feed following application
  • Server side: Python/Django,

PostgreSQL, PL/pgSQL

  • Client side: emulated browser,

implemented in Python/Django and PostgreSQL

16

slide-17
SLIDE 17

Experiment: Configurations

  • Used ~ 100 cloud instances from Amazon
  • Most are used for emulated browsers
  • 3 to 6 c1.medium as servers
  • Use memcached to simulate

caches

17

slide-18
SLIDE 18

Experiment: Workload

  • Work load similar to the Yahoo! PNUTS experiment
  • A following network of ~ 200,000 users
  • Synthetic workload generated by Yahoo! Cloud

Serving Benchmark

18

slide-19
SLIDE 19

Experiment Result: Query Rate

19

slide-20
SLIDE 20

Experiment Result: Latency

20

slide-21
SLIDE 21

Experiment Results: Caching

21

slide-22
SLIDE 22

Experiment Results: CPU Load

22

Server Client

slide-23
SLIDE 23

Conclusions

  • Consistently scale feed following
  • Linear scalability
  • Practical low cost solution

23

slide-24
SLIDE 24

Thank You

  • Questions?

24