Scalable Consistency in Scatter
Lisa Glendenning Ivan Beschastnikh Arvind Krishnamurthy Thomas Anderson University of Washington
1
A Distributed Key-Value Storage System
October 2011 Supported by NSF CNS-0963754
1
Scalable Consistency in Scatter A Distributed Key-Value Storage - - PowerPoint PPT Presentation
Scalable Consistency in Scatter A Distributed Key-Value Storage System Lisa Glendenning Ivan Beschastnikh University of Washington Arvind Krishnamurthy Thomas Anderson Supported by NSF CNS-0963754 October 2011 1 1 Internet services
1
October 2011 Supported by NSF CNS-0963754
1
2
Dynamo
2
3
3
4
4
5
links between nodes form overlay nodes keys
partition and assign keys to nodes
knowledge of system state is distributed among all nodes
nodes coordinate locally to respond to churn, e.g.,
nodes
5
6
ka kb b a c.pred = a c.succ = b a.succ = c b.pred = c b.keys = (kc,kb] c.keys = (ka,kc] c
ka kb b a c kc
JOIN
6
6
ka kb b a c.pred = a c.succ = b a.succ = c b.pred = c b.keys = (kc,kb] c.keys = (ka,kc] c
ka kb b a c kc
JOIN
communication fault between b and c
FAULT OUTCOME
both b and c claim
(ka,kc] c fails during
no node claims
(ka,kc] communication fault between a and c routes through a skip over c
6
7
7
8
group node use groups as building blocks instead of nodes
How is Scatter different? What does this give us?
set of nodes that cooperatively manage a key-range
What is a group?
entity
individual node
involving multiple groups
8
9
a b c ka kz kc kb nodes = {a,b,c} keys = (kz,kc] values = {...}
among members with Paxos
a.keys = (kz,ka] b.keys = (ka,kb] c.keys = (kb,kc]
are Paxos reconfigurations:
among nodes of group for performance
9
10
multi-group operations:
MERGE SPLIT
some problems can’t be handled within a single group
a b c b1 a c b2
between groups
distributed transactions coordinated locally by groups
10
11
a b c split?
11
11
a b c split?
a b c
11
11
a b c split?
a b c a b c split!
11
11
a b c split?
a b c a b c split! a b2 c b1
11
11
b split?
a b c split?
a b c a b c split! a b2 c b1
11
11
b split?
a c split b?
split b?
a b c split?
a b c a b c split! a b2 c b1
11
11
b split?
a c split b?
split b?
a b c split?
a b c a b c split! a b2 c b1 b split!
b1 b2
RECONFIGURE!
committed
11
11
b split?
a c split b?
split b?
a c b split!
b split!
a b c split?
a b c a b c split! a b2 c b1 b split!
b1 b2
RECONFIGURE!
committed
11
12
12
13
13
14
Layered OpenDHT’s recursive routing on top of Scatter groups Implemented a Twitter- like application, Chirp
14
15
75 80 85 90 95 100 100 300 500 700 900
consistent fetches (%) node lifetime (seconds) Scatter OpenDHT
75 80 85 90 95 100 100 300 500 700 900
completed fetches (%) node lifetime (seconds) Scatter OpenDHT
15
16
350 700 1050 1400 100 300 500 700 900
fetch latency (ms) node lifetime (seconds) Scatter OpenDHT 10-12%
16
17
statically partitioned global key-space to multiple, isolated ZooKeeper instantiations small-scale, centralized coordination service
Z1 Z2 Z3
17
18
100 200 300 400 5 25 50 75 100 125 150 throughput (1000 ops/sec) total number of nodes
Scatter ZooKeeper
18
19
19