Dynamo concepts in depth. Pavlo Baron, codecentric AG Friday, - - PowerPoint PPT Presentation

dynamo concepts in depth pavlo baron codecentric ag
SMART_READER_LITE
LIVE PREVIEW

Dynamo concepts in depth. Pavlo Baron, codecentric AG Friday, - - PowerPoint PPT Presentation

Dynamo concepts in depth. Pavlo Baron, codecentric AG Friday, August 31, 12 Pavlo Baron pavlo.baron@codecentric.de @pavlobaron Friday, August 31, 12 The shopping cart case Friday, August 31, 12 The 2 AM alarm call case Friday, August 31,


slide-1
SLIDE 1

Dynamo concepts in depth. Pavlo Baron, codecentric AG

Friday, August 31, 12

slide-2
SLIDE 2

Pavlo Baron

pavlo.baron@codecentric.de @pavlobaron

Friday, August 31, 12

slide-3
SLIDE 3

The shopping cart case

Friday, August 31, 12

slide-4
SLIDE 4

The 2 AM alarm call case

Friday, August 31, 12

slide-5
SLIDE 5

The Tower of Babel case

Friday, August 31, 12

slide-6
SLIDE 6

The Neo vs. Smiths case

Friday, August 31, 12

slide-7
SLIDE 7

The Pavlo case

Friday, August 31, 12

slide-8
SLIDE 8

Friday, August 31, 12

slide-9
SLIDE 9

So Dynamo isn’t about speed. It’s about immediate, reliable writes. It’s about

  • peration relaxation.

It’s about distribution and fault tolerance. It’s about almost linear scalability.

Friday, August 31, 12

slide-10
SLIDE 10

Time and timestamps

Friday, August 31, 12

slide-11
SLIDE 11

Clocks V(i), V(j): competing Conflict resolution: 1: siblings, client 2: merge, system 3: voting, system

Friday, August 31, 12

slide-12
SLIDE 12

Node 1 Node 2 Node 3 1,0,0 1,2,0 3,2,0 1,3,3 1,1,0 1,0,1 1,2,2 1,2,3 2,2,0 4,3,3 4,4,3 4,3,4

Vector clocks

Friday, August 31, 12

slide-13
SLIDE 13

Node 2 Node 3 Node 4 1,1,0,0 1,0,1,0 1,0,0,1 1,3,0,3 1,2,0,2 1,2,0,3

Vector clocks

Node 1 1,0,0,0 1,2,0,0 1,0,2,0

Friday, August 31, 12

slide-14
SLIDE 14

O(1) for data lookups / delta tracking

#

Friday, August 31, 12

slide-15
SLIDE 15

Merkle Trees N, M: nodes HT(N), HT(M): hash trees M needs update:

  • btain HT(N)

calc delta(HT(M), HT(N)) pull keys(delta)

Friday, August 31, 12

slide-16
SLIDE 16

Node a.1 Node a.2

a ab ac abc abd acb acc

Merkle Trees

a ab ad abe abd ada adb

Friday, August 31, 12

slide-17
SLIDE 17

Node a.1 Node a.2

a ab abc abd

Merkle Trees

a ab ad abd ada adb

Friday, August 31, 12

slide-18
SLIDE 18

“Equal” nodes based decentralized distribution

Friday, August 31, 12

slide-19
SLIDE 19

Consensus, agreement, voting, quorum

Friday, August 31, 12

slide-20
SLIDE 20

Consistent hashing - the ring X bit integer space 0 <= N <= 2 ^ X

  • r: 2 x Pi

0 <= A <= 2 x Pi x(N) = cos(A) y(N) = sin(A)

Friday, August 31, 12

slide-21
SLIDE 21

Quorum V: vnodes holding a key W: write quorum R: read quorum DW: durable write quorum W > 0.5 * V R + W > V

Friday, August 31, 12

slide-22
SLIDE 22

Key = “foo” # = N, W = 2

N

Insert key (sloppy quorum)

replicate

  • k

Friday, August 31, 12

slide-23
SLIDE 23

leave

Add node

c

  • p

y copy leave leave copy

Friday, August 31, 12

slide-24
SLIDE 24

Key = “foo” # = N, R = 2

N

Lookup key (sloppy quorum)

Value = “bar”

Friday, August 31, 12

slide-25
SLIDE 25

leave

Remove node

copy

Friday, August 31, 12

slide-26
SLIDE 26

Node 1 Node 2 Node 3 update

Gossip – node down/up

Node 4 update update, 4 down read read, 4 up update

Friday, August 31, 12

slide-27
SLIDE 27

Eventual consistency

Friday, August 31, 12

slide-28
SLIDE 28

BASE Basically Available, Soft-state, Eventually consistent Opposite to ACID

Friday, August 31, 12

slide-29
SLIDE 29

Read your write consistency

write v 2 read v2

FE1 v 2 Data store v 3 v 1

write v 1 read v1

FE2

Friday, August 31, 12

slide-30
SLIDE 30

Session 2 Session 1

Session consistency

write v 2 read v2

FE v 2 Data store v 3 v 1

write v 1 read v1

Friday, August 31, 12

slide-31
SLIDE 31

Monotonic read consistency

read v 2 read v2

FE1 v 2 Data store v 3 v 1

read v 3 read v4

FE2 v 4

read v3

Friday, August 31, 12

slide-32
SLIDE 32

Monotonic write consistency

write v 1 write v2

FE1 Data store v 1

read v 3 read v3

FE2 v 4 v 2 v 3

Friday, August 31, 12

slide-33
SLIDE 33

Eventual consistency

read v 1 read v2

FE1 Data store v 3

write v 3

FE2

read v3

v 1

read v2

v 2

Friday, August 31, 12

slide-34
SLIDE 34

Hinted handofg N: node, G: group including N node(N) is unavailable replicate to G or store data(N) locally hint handofg for later node(N) is alive handofg data to node(N)

Friday, August 31, 12

slide-35
SLIDE 35

Key = “foo”

N replicate

Key = “foo”, # = N -> handofg hint = true

Direct replica fails

Friday, August 31, 12

slide-36
SLIDE 36

Replica recovers

handofg

Friday, August 31, 12

slide-37
SLIDE 37

N

Key = “foo”, # = N -> handofg hint = true

All replicas fail

Friday, August 31, 12

slide-38
SLIDE 38

All replicas recover

replicate handofg

Friday, August 31, 12

slide-39
SLIDE 39

Friday, August 31, 12

slide-40
SLIDE 40

Latency is an adjustment screw

Friday, August 31, 12

slide-41
SLIDE 41

Availability is an adjustment screw

Friday, August 31, 12

slide-42
SLIDE 42

CAP – the variations CA – irrelevant CP – eventually unavailable ofgering maximum consistency AP – eventually inconsistent ofgering maximum availability

Friday, August 31, 12

slide-43
SLIDE 43

CAP – the tradeofg A C

Friday, August 31, 12

slide-44
SLIDE 44

CP

Replica 1 Replica 2

v 1

read write v 2 read

v 1 v 2 v 2

Friday, August 31, 12

slide-45
SLIDE 45

CP (partition)

Replica 1 Replica 2

v 1

read write v 2 read

v 1 v 2

Friday, August 31, 12

slide-46
SLIDE 46

AP

Replica 1 Replica 2

v 1

read write v 2 read

v 1 v 2 v 2

replicate

Friday, August 31, 12

slide-47
SLIDE 47

AP (partition)

Replica 1 Replica 2

v 1

read write v 2 read

v 1 v 2 v 2

hint handofg

Friday, August 31, 12

slide-48
SLIDE 48

Frequent structure changes

Friday, August 31, 12

slide-49
SLIDE 49

Thank you

Friday, August 31, 12

slide-50
SLIDE 50

Many graphics I’ve created myself Some images originate from istockphoto.com except few ones taken from Wikipedia and product pages

Friday, August 31, 12