Scalability and Replication
Marco Serafini
COMPSCI 532 Lecture 13
Scalability and Replication Marco Serafini COMPSCI 532 Lecture 13 - - PowerPoint PPT Presentation
Scalability and Replication Marco Serafini COMPSCI 532 Lecture 13 Scalability 2 Scalability Ideal world Linear scalability Speedup Reality Ideal Bottlenecks For example: central coordinator When do we stop
COMPSCI 532 Lecture 13
2
3
Parallelism Speedup Ideal Reality
44
55
Scalability! But at what COST?
Frank McSherry Michael Isard Derek G. Murray Unaffiliated Microsoft Research Unaffiliated∗
Abstract
We offer a new metric for big data platforms, COST,
The COST of a given platform for a given problem is the hardware configuration required before the platform out- performs a competent single-threaded implementation. COST weighs a system’s scalability against the over- heads introduced by the system, and indicates the actual performance gains of the system, without rewarding sys- tems that bring substantial but parallelizable overheads. We survey measurements of data-parallel systems re- cently reported in SOSP and OSDI, and find that many systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations.
300 1 10 100 50 1 10 cores speed-up s y s t e m A system B 300 1 10 100 1000 8 100 cores seconds system A system B
Figure 1: Scaling and performance measurements for a data-parallel algorithm, before (system A) and after (system B) a simple performance optimization. The unoptimized implementation “scales” far better, despite (or rather, because of) its poor performance. argue that many published big data systems more closely resemble system A than they resemble system B.
7
7
300 1 10 100 50 1 10 cores speed-up system A system B 300 1 10 100 50 1 10 cores speed-up system A system B
8
8
300 1 10 100 1000 8 100 cores seconds s y s t e m A system B
99
512 16 100 20 1 10 cores seconds Vertex SSD Hilbert RAM GraphLab N a i a d 512 64 100 460 50 100 cores seconds GraphX Vertex SSD Hilbert RAM
Single iteration 10 iterations
10
10
11
11
12
12
13
14
15
15
Replication agent Replication agent Replication protocol Replica Replica Replica Client Replication agent Replication protocol
16
16
17
17
18
18
19
19
20
20
hang forever waiting for crashed replicas)
21
21
… Replicas Writer Reader w(v) ack r v wait for n-f acks wait for n-f replies
22
22
… Replicas Writer Reader (1) w(v,t) (1) r (2) wait for n-f rcv (vi,ti) ack (3) w(vi,ti) with max ti
Reference: Attiya, Bar-Noy, Dolev. “Sharing memory robustly in message-passing systems”
replicas set vi = v only if t > ti (2) wait for n-f acks (4) wait for n-f acks
23
23
write (v = 5) read (v)à5 read (v)à4 Writer Reader 1 Reader 2
24
concurrent client requests R1 R2 R3 consensus R2 R1 R3 SM SM SM Consistent
Consistent decision
execution order
25
25
26
26
27
27
28
28
29
L send read(b) wait for n-f replies If some reply is (vi, bi), set v to vi with highest bi send proposal (v, b)
this breaks promise
If the replica has previously accepted a proposal (vi, bi) and b > bi
with ballot < b If no prior accepted proposal reply with ack wait for n-f acks, then decide on v and broadcast decision Newly elected leader picks unique ballot number b. It has its own proposed value v
Reference: L. Lamport. “Paxos made simple”
If progress gets stuck (not enough replies), the leader picks a larger ballot number and restarts the protocol. Eventually, there will be a single leader with a large enough ballot number which completes all the steps
30
30
31
31
32
33
begin txn write z = 2 read x read y if x > y write y = x commit else abort end txn
34
35
35
36
36
else send abort request