CRDTs in Practice Marc Shapiro Inria & UPMC Nuno Preguia U. - - PowerPoint PPT Presentation

crdts in practice
SMART_READER_LITE
LIVE PREVIEW

CRDTs in Practice Marc Shapiro Inria & UPMC Nuno Preguia U. - - PowerPoint PPT Presentation

CRDTs in Practice Marc Shapiro Inria & UPMC Nuno Preguia U. NOVA Cloud to the edge Social, web, e-commerce: shared mutable data Scalability replication consistency issues 2 [CRDTs in practice CodeMesh 2015] Cloud to


slide-1
SLIDE 1

CRDTs in Practice

Marc Shapiro – Inria & UPMC Nuno Preguiça – U. NOVA

slide-2
SLIDE 2

[CRDTs in practice — CodeMesh 2015]

Cloud to the edge

Social, web, e-commerce: shared mutable data Scalability ⇒ replication ⇒ consistency issues

2

slide-3
SLIDE 3

[CRDTs in practice — CodeMesh 2015]

Cloud to the edge

Social, web, e-commerce: shared mutable data Scalability ⇒ replication ⇒ consistency issues

3

slide-4
SLIDE 4

[CRDTs in practice — CodeMesh 2015]

Conflict-free replicated data types

Data type

  • Encapsulates issues

Replicated

  • At multiple nodes

Available

  • Update my replica without coordination
  • Convergence guaranteed (by mathematical

properties)

  • Decentralised, peer-to-peer

4

slide-5
SLIDE 5

[All About Consistency — CodeMesh 2015]

Why use CRDTs

Availability is king (otherwise stay away)

  • ⟹ concurrent updates

Fine-grain mutable shared data

  • Registers not sufficient

Mobile computing In DC Geo-replication

5

slide-6
SLIDE 6

[CRDTs in practice — CodeMesh 2015]

CRDT design concepts

Backward-compatible with sequential datatype If operations commute, they can be concurrent

  • add(e); rm(f) ≣ rm(f); add(e) ≣ add(e) || rm (f)

Otherwise, deterministic semantics

  • Close to sequential rm(e);add(e) or add(e); rm(e)
  • Don’t lose updates
  • Result doesn't depend on order received
  • Stable preconditions

6

≣ add(e) || rm (f)

slide-7
SLIDE 7

[CRDTs in practice — CodeMesh 2015] 7

slide-8
SLIDE 8

[CRDTs in practice — CodeMesh 2015]

bet365

Largest European on-line betting operator

  • Bursty load: 2.5 million simultaneous users
  • 1 Tb working set
  • 1000s servers
  • Slow users: transient inconsistency OK
  • Availability, read my writes, monotonic reads
  • Transparency

Before: SQLserver, doesn't scale, hours to converge mid 2013: noSQL riak: available, siblings; ad-hoc merge (hard!)

8

slide-9
SLIDE 9

[CRDTs in practice — CodeMesh 2015]

bet365 CRDT experience

≥ Jan. 2014; in anger ≥ Dec. 2014 ORSWOT add-remove set

  • Add, remove element; scan for similar
  • 100s Gb

Transformational : “CRDTs saved the day”

  • Correct by construction
  • Stable; partitions fixed quickly, correctly

Future wish list: “Extra guarantees … without impacting availability.”

9

slide-10
SLIDE 10

[CRDTs in practice — CodeMesh 2015]

CRDT Set design space

Many Set operations commute: add(e) / add(f), add(e) / rm(f), etc. Non-commuting pair: add(e) / rm(e)

  • sequential consistency
  • last writer wins? { add(e)<rmv(e) ⟹ e ∉ S

∧ rmv(e)<add(e) ⟹ e ∈ S }

  • error state?

{⊥e ∈ S}

  • add wins?

{e ∈ S}

  • remove wins?

{e ∉ S} All deterministic, satisfy conditions

10

slide-11
SLIDE 11

[CRDTs in practice — CodeMesh 2015]

TV Venice

Wedding list

Replicated wedding list Ordered list of “wishes” (strings)

  • lookup (wish) ⟶ rank
  • add (position, wish)
  • rm (position)

Position: “after item”

11

TV Venice TV Venice TV Venice Books TV Venice Books TV Books TV Venice TV Ski trip Books TV Venice Ski trip TV Venice Ski trip TV Ski trip Books

slide-12
SLIDE 12

[CRDTs in practice — CodeMesh 2015]

TV ski trip books

12

⊢ ⊢ ⊣ ⊣

World peace laptop Venice iDrone

Each item points to the next one

  • add (pos, item): link item after the one at pos
  • rm (item): mark as tombstone
  • add (pos, item1) || add (pos, item2): deterministic
slide-13
SLIDE 13

[CRDTs in practice — CodeMesh 2015]

TV ski trip books

13

⊢ ⊢ ⊣ ⊣

World peace laptop Venice iDrone

Each item points to the next one

  • add (pos, item): link item after the one at pos
  • rm (item): mark as tombstone
  • add (pos, item1) || add (pos, item2): deterministic
slide-14
SLIDE 14

[CRDTs in practice — CodeMesh 2015]

Lowering your expectations

  • lookup (wish) ⟶ rank
  • add (pos, wish)
  • rm (pos)
  • mv (wish, pos1, pos2)
  • add (…, pos2); rm (pos1)
  • offer (wish)

14

iDrone World Peace TV Ski trip Books Laptop iDrone World Peace TV Ski trip Books Laptop World Peace iDrone TV Ski trip Books Laptop iDrone World Peace TV Ski trip Books Laptop iDrone World Peace TV Ski trip Books Laptop iDrone TV Ski trip Books Laptop World Peace

slide-15
SLIDE 15

[CRDTs in practice — CodeMesh 2015]

Lowering your expectations

15

  • lookup (wish) ⟶ rank
  • add (pos, wish)
  • rm (pos)
  • mv (wish, pos1, pos2)
  • add (…, pos2); rm (pos1)
  • offer (wish)

World Peace iDrone TV Ski trip Books Laptop World Peace World Peace iDrone TV Ski trip Books Laptop World Peace

slide-16
SLIDE 16

[CRDTs in practice — CodeMesh 2015]

Lowering your expectations

16

  • iDrone

World Peace TV Ski trip Books Laptop iDrone World Peace TV Ski trip Books Laptop

  • lookup (wish) ⟶ rank
  • add (pos, wish)
  • rm (pos)
  • mv (wish, pos1, pos2)
  • add (…, pos2); rm (pos1)
  • offer (wish)
  • iDrone

World Peace TV Ski trip Books Laptop

slide-17
SLIDE 17

[CRDTs in practice — CodeMesh 2015]

The problem with invariants

Remove specification { true } rm(wish) { tombstone(wish) } Move, offer: maintain uniqueness invariant { ¬offered(wish,_) } offer(wish) { offered(wish, red) } Precondition stable under concurrent updates?

  • If so, invariant guaranteed
  • Otherwise, all bets are off

17

slide-18
SLIDE 18

[CRDTs in practice — CodeMesh 2015]

Lessons learned

Availability ⟹ concurrent updates

  • Mask their undesirable effects

Backwards compatible

  • Same sequential semantics
  • Commute ⟹ same concurrent semantics
  • otherwise, “close enough”

Maintaining invariants

  • Stable preconditions

18

slide-19
SLIDE 19

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Numeric Invariants

Many applications need to enforce conditions like: counter ≥ K E.g.:

  • Number of impressions left ≥ 0
  • Virtual money in a game ≥ 0

19

slide-20
SLIDE 20

[All About Consistency — CodeMesh 2015]

Numeric invariants

X ≥ 0 Given X = n , there are n rights to execute dec() Distribute rights among replicas

  • Consume rights for dec()
  • Create rights on inc()

20

slide-21
SLIDE 21

[All About Consistency — CodeMesh 2015]

CRDT-ish

Execute operations locally without coordination Peer-to-peer synchronisation Fail if not enough rights exist

21

slide-22
SLIDE 22

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: API

Create(type, value); Increment(value); Decrement(value); Value(); Transfer(to, qty);

22

slide-23
SLIDE 23

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: increment

R

r1 r2 r3 U r1 r2 r3 R1 R2 R3

Increment(10); Increment(8); Increment(15);

23

R

r1 r2 r3 U r1 r2 r3

R

r1 r2 r3 U r1 r2 r3

R

r1 r2 r3 U r1 10 r2 r3

R

r1 r2 r3 U r1 r2 15 r3

R

r1 r2 r3 U r1 r2 r3 8

slide-24
SLIDE 24

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: increment

R1 R2 R3

24

R

r1 r2 r3 U r1 10 r2 r3

R

r1 r2 r3 U r1 r2 15 r3

R

r1 r2 r3 U r1 r2 r3 8

slide-25
SLIDE 25

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: decrement

R1 R2 R3

25

R

r1 r2 r3 U r1 10 r2 r3

R

r1 r2 r3 U r1 r2 15 r3

R

r1 r2 r3 U r1 r2 r3 8

decrement(15); decrement(5);

R

r1 r2 r3 U r1 r2 15 5 r3

slide-26
SLIDE 26

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: transfer

R1 R2 R3

26

R

r1 r2 r3 U r1 10 r2 r3

R

r1 r2 r3 U r1 r2 15 r3

R

r1 r2 r3 U r1 r2 r3 8

R

r1 r2 r3 U r1 r2 15 5 r3

transfer(r1, 4);

R

r1 r2 r3 U r1 r2 r3 4 8

slide-27
SLIDE 27

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: transfer

R1 R2 R3

27

R

r1 r2 r3 U r1 10 r2 r3

R

r1 r2 r3 U r1 r2 15 5 r3

R

r1 r2 r3 U r1 r2 r3 4 8

slide-28
SLIDE 28

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: merge

R1 R2 R3

28

R

r1 r2 r3 U r1 10 r2 r3

R

r1 r2 r3 U r1 r2 15 5 r3

R

r1 r2 r3 U r1 r2 r3 4 8

merge(r1,r2);

Each replica only touches his line. Merge by taking max of each cell.

slide-29
SLIDE 29

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: merge

R1 R2 R3

29

R

r1 r2 r3 U r1 10 r2 r3

R

r1 r2 r3 U r1 r2 15 5 r3

R

r1 r2 r3 U r1 r2 r3 4 8

merge(r1,r2);

Each replica only touches his line. Merge by taking max of each cell.

R

r1 r2 r3 U r1 10 r2 15 5 r3

slide-30
SLIDE 30

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: merge

R1 R2 R3

30

R

r1 r2 r3 U r1 10 r2 r3

R

r1 r2 r3 U r1 r2 15 5 r3

R

r1 r2 r3 U r1 r2 r3 4 8

Each replica only touches his line. Merge by taking max of each cell.

merge(r3,r2);

R

r1 r2 r3 U r1 10 r2 15 5 r3

slide-31
SLIDE 31

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: merge

R1 R2 R3

31

R

r1 r2 r3 U r1 10 r2 r3

R

r1 r2 r3 U r1 r2 15 5 r3

R

r1 r2 r3 U r1 r2 r3 4 8

Each replica only touches his line. Merge by taking max of each cell.

merge(r3,r2);

R

r1 r2 r3 U r1 10 r2 15 5 r3 4 8

slide-32
SLIDE 32

[All About Consistency — CodeMesh 2015]

Check local rights ≥ 12

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: decrement

R1 R2 R3

32

R

r1 r2 r3 U r1 10 r2 15 5 r3 4 8

R

r1 r2 r3 U r1 10 r2 15 5 r3 4 8

R

r1 r2 r3 U r1 r2 r3 4 8

decrement(12);

slide-33
SLIDE 33

[All About Consistency — CodeMesh 2015]

Check local rights ≥ 12

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: decrement

R1 R2 R3

33

R

r1 r2 r3 U r1 10 r2 15 5 r3 4 8

R

r1 r2 r3 U r1 10 r2 15 5 r3 4 8

R

r1 r2 r3 U r1 r2 r3 4 8

decrement(12);

10 10 =

local = R[1][1]

slide-34
SLIDE 34

[All About Consistency — CodeMesh 2015]

Check local rights ≥ 12

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: decrement

R1 R2 R3

34

R

r1 r2 r3 U r1 10 r2 15 5 r3 4 8

R

r1 r2 r3 U r1 10 r2 15 5 r3 4 8

R

r1 r2 r3 U r1 r2 r3 4 8

decrement(12);

10 14 = 4

+

local = R[1][1] + ΣR[i][1]

slide-35
SLIDE 35

[All About Consistency — CodeMesh 2015]

Check local rights ≥ 12

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Bounded Counter: decrement

R1 R2 R3

35

R

r1 r2 r3 U r1 10 r2 15 5 r3 4 8

R

r1 r2 r3 U r1 10 r2 15 5 r3 4 8

R

r1 r2 r3 U r1 r2 r3 4 8

decrement(12);

10 14 = 4

+

  • local = R[1][1] + ΣR[i][1] - ΣR[1][j] - U[1]
slide-36
SLIDE 36

[All About Consistency — CodeMesh 2015]

Using Bounded Counter

Operation execute locally; fail if no rights available Redistribute rights

  • On-demand when needed
  • Proactive

Peer-to-peer synchronization Prototype implemented on top of Riak

36

slide-37
SLIDE 37

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Micro Benchmark

Deployment: 3 Regions on AWS (m1.large) Configurations:

  • STRONG - Strong consistency (all writes on 1 DC);
  • WEAK- Eventual Consistency (Riak Counters);
  • BC - Bounded Counter.

37

slide-38
SLIDE 38

[All About Consistency — CodeMesh 2015]

Valter Balegas et al.– NOVA LINCS, DI, FCT, Universidade NOVA de Lisboa @ RICON'15

Latency for multiple keys

38

slide-39
SLIDE 39

[All About Consistency — CodeMesh 2015]

SwiftCloud approach

39

T r a n s m i t

partial database app

Process request
 & store update Transmit T r a n s m i t fail-over

full database

Transmit

slide-40
SLIDE 40

[All About Consistency — CodeMesh 2015]

SwitftCloud key features

Cache data at clients

  • Modify cached data => low latency, high availability

Highly available transactions

  • Atomic updates
  • Read snapshot
  • CRDT rules for margining concurrent updates

Causal consistency

  • Write fast, read in the past
  • Client-assisted failover

40

slide-41
SLIDE 41

[All About Consistency — CodeMesh 2015]

SwiftSocial

High-level operations

  • Registering user, Login/Logout
  • Post status update; send message
  • View wall
  • Friendship management

Operations modeled as transactions State

  • Set CRDT for messages and friends
  • Register CRDT for user data
  • Counter CRDT for polls

41

slide-42
SLIDE 42

[All About Consistency — CodeMesh 2015]

Swift FS

Directory: (name, type) → object

  • Shallow Map
  • create (n, t, v) ≈ add

Concurrent: merge v recursively

  • remove (n,t): whole subtree

Concurrent create, edit: re-create

  • Object-specific operations (e.g. graph)
  • No move => can lead to cycles

42

slide-43
SLIDE 43

[All About Consistency — CodeMesh 2015]

ms ms ms

Experiments

3 DCs in Amazon EC2 100 client nodes in PlanetLab Cache size: 512 objects SwiftSocial: 90% cache hits

43

slide-44
SLIDE 44

[All About Consistency — CodeMesh 2015]

Update caching + Read-In-Past minimize latency

Operations with > 1 cache miss

  • 2. RTT

Read-in-past + client- assisted fault tolerance RTT Client-side caching & updates

44

writes r e a d s reads/writes, remote, no FT reads, remote + stable update writes, remote+stable update

slide-45
SLIDE 45

[All About Consistency — CodeMesh 2015]

Latency vs. throughput

45

SwiftCloud

1 DC 2 DC 3 DC 1 DC 2 DC 3 DC

classic synch. 60×

slide-46
SLIDE 46

[All About Consistency — CodeMesh 2015]

Latency vs. throughput

46

SwiftCloud

1 DC 2 DC 3 DC 1 DC 2 DC 3 DC

classic synch. 6× 12×

slide-47
SLIDE 47

[All About Consistency — CodeMesh 2015]

Staleness for fault tolerance

47

slide-48
SLIDE 48

[CRDTs in practice — CodeMesh 2015]

Summary

Applications requires multiple CRDTs

  • Composition (e.g. Rick Map)

Need to lower expectations… … but still possible to enforce some invariants

  • Multi-key updates: HATs
  • Causality
  • Numeric invariants
  • General invariants: red-blue, just-right

consistency

48

slide-49
SLIDE 49

[All About Consistency — CodeMesh 2015]

Acknowledgments

SyncFree

  • European FP7 project #609 551, 2013--2016

Masoud Saeida-Ardakani, Carlos Baquero, Valter Balegas, Annette Bieniusa, Russell Brown, Sérgio Duarte, Carla Ferreira, Alexey Gotsman, Mahsa Najafzadeh, Marek Zawirski, and more.

49