Just-Right Consistency Closing the CAP Gap Christopher S. Meiklejohn - - PowerPoint PPT Presentation

just right consistency closing the cap gap
SMART_READER_LITE
LIVE PREVIEW

Just-Right Consistency Closing the CAP Gap Christopher S. Meiklejohn - - PowerPoint PPT Presentation

Just-Right Consistency Closing the CAP Gap Christopher S. Meiklejohn (@cmeik), Peter Lash LIGHT ONE Outline: Closing the CAP Gap Just-Right Consistency Available as possible, and consistent when necessary 2 Outline: Closing the CAP


slide-1
SLIDE 1

Just-Right Consistency Closing the CAP Gap

Christopher S. Meiklejohn (@cmeik), Peter Lash

LIGHT ONE

slide-2
SLIDE 2

Outline: Closing the CAP Gap

  • Just-Right Consistency


Available as possible, and consistent when necessary

2

slide-3
SLIDE 3

Outline: Closing the CAP Gap

  • Just-Right Consistency


Available as possible, and consistent when necessary

  • AntidoteDB


The first database that provides transactions with strong semantics, targeted at the JRC approach

2

slide-4
SLIDE 4

Outline: Closing the CAP Gap

  • Just-Right Consistency


Available as possible, and consistent when necessary

  • AntidoteDB


The first database that provides transactions with strong semantics, targeted at the JRC approach

  • Moving forward


Antidote’s path forward from research to company and product

2

slide-5
SLIDE 5

Motivation Cloud Databases

3

slide-6
SLIDE 6

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-7
SLIDE 7

A

Centralized database.

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-8
SLIDE 8

A

Clients read and write against the primary copy.

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-9
SLIDE 9

A B C

Geo-replicated for both fault-tolerance and high-availability.

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-10
SLIDE 10

A B C

Clients read and write locally for low-latency.

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-11
SLIDE 11

A B C

What happens if C can’t communicate with other replicas?

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-12
SLIDE 12

A B C

Choice 1: Consistent-Under-Partition (CP)

  • Synchronize each operation


Maintains “single system image”

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-13
SLIDE 13

A B C

Choice 1: Consistent-Under-Partition (CP)

  • Synchronize each operation


Maintains “single system image”

  • Spanner/F1, serializability model


Coordination is expensive; Spanner typically has to wait 100ms to commit an update transaction

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-14
SLIDE 14

A B C

Choice 1: Consistent-Under-Partition (CP)

  • Synchronize each operation


Maintains “single system image”

  • Spanner/F1, serializability model


Coordination is expensive; Spanner typically has to wait 100ms to commit an update transaction

Over-conservative, but easy to program!

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-15
SLIDE 15

A B C

Choice 2: Available-Under-Partition (AP)

  • Riak, Cassandra, Dynamo


Operations issued against local copy, and across the cluster in parallel

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-16
SLIDE 16

A B C

Choice 2: Available-Under-Partition (AP)

  • Riak, Cassandra, Dynamo


Operations issued against local copy, and across the cluster in parallel

  • Local operation only, asynchronous propagation


Stale reads and write conflicts will occur without synchronization

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-17
SLIDE 17

A B C

Choice 2: Available-Under-Partition (AP)

  • Riak, Cassandra, Dynamo


Operations issued against local copy, and across the cluster in parallel

  • Local operation only, asynchronous propagation


Stale reads and write conflicts will occur without synchronization

Available, but difficult to program!

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-18
SLIDE 18

A B C

CAP Theorem CP AP

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-19
SLIDE 19

A B C

CAP Theorem

High cost

CP AP

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-20
SLIDE 20

A B C

CAP Theorem

High cost Low availability

CP AP

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-21
SLIDE 21

A B C

CAP Theorem

High cost Low availability Synchronization

CP AP

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-22
SLIDE 22

A B C

CAP Theorem

High cost Low availability Synchronization Low cost

CP AP

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-23
SLIDE 23

A B C

CAP Theorem

High cost Low availability Synchronization Low cost High availability

CP AP

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-24
SLIDE 24

A B C

CAP Theorem

High cost Low availability Synchronization Low cost High availability Anomalies

CP AP

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-25
SLIDE 25

A B C

CAP Theorem

High cost Low availability Synchronization Low cost High availability Anomalies

CP AP

False dichotomy!

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

slide-26
SLIDE 26

A B C

CAP Theorem

High cost Low availability Synchronization Low cost High availability Anomalies

CP AP

False dichotomy!

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

  • No “one-size-fits-all” consistency model


Choosing either model will either be over-conservative or risk anomalies

slide-27
SLIDE 27

A B C

CAP Theorem

High cost Low availability Synchronization Low cost High availability Anomalies

CP AP

False dichotomy!

[Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

  • No “one-size-fits-all” consistency model


Choosing either model will either be over-conservative or risk anomalies

  • Application-level invariants


Instead, tailor consistency choices based on application- level invariants for each operation

slide-28
SLIDE 28

Just Right Consistency

  • Preserve sequential patterns


Applications written sequentially that are correct should maintain correctness under concurrency

13

slide-29
SLIDE 29

Just Right Consistency

  • Preserve sequential patterns


Applications written sequentially that are correct should maintain correctness under concurrency

  • AP-compatible invariants


Strongest AP model; invariants that only require “one way” communications

13

slide-30
SLIDE 30

Just Right Consistency

  • Preserve sequential patterns


Applications written sequentially that are correct should maintain correctness under concurrency

  • AP-compatible invariants


Strongest AP model; invariants that only require “one way” communications

  • CAP-sensitive invariants


Transactions that require coordination; “two way” communication invariants

13

slide-31
SLIDE 31

Just Right Consistency

  • Preserve sequential patterns


Applications written sequentially that are correct should maintain correctness under concurrency

  • AP-compatible invariants


Strongest AP model; invariants that only require “one way” communications

  • CAP-sensitive invariants


Transactions that require coordination; “two way” communication invariants

  • Tools for analysis and verification


Identify and verify application has sufficient synchronization to ensure application invariants

13

slide-32
SLIDE 32

Example Fælles Medicinkort

14

slide-33
SLIDE 33

Fælles Medicinkort

  • FMK [production] / FMKe [synthetic workload]


Danish National Joint Medicine Card; operating 24x7 since 2013 for 6 million Danish citizens

15

slide-34
SLIDE 34

Fælles Medicinkort

  • FMK [production] / FMKe [synthetic workload]


Danish National Joint Medicine Card; operating 24x7 since 2013 for 6 million Danish citizens

  • Lifecycle management for prescriptions


Involves patient, pharmacy, and doctor management around active prescriptions in Denmark

15

slide-35
SLIDE 35

Fælles Medicinkort

  • FMK [production] / FMKe [synthetic workload]


Danish National Joint Medicine Card; operating 24x7 since 2013 for 6 million Danish citizens

  • Lifecycle management for prescriptions


Involves patient, pharmacy, and doctor management around active prescriptions in Denmark

  • Assumed correct in isolation


“Correct-Individually”, C in ACID, each operation ensures application-level invariants

15

slide-36
SLIDE 36

Fælles Medicinkort

  • FMK [production] / FMKe [synthetic workload]


Danish National Joint Medicine Card; operating 24x7 since 2013 for 6 million Danish citizens

  • Lifecycle management for prescriptions


Involves patient, pharmacy, and doctor management around active prescriptions in Denmark

  • Assumed correct in isolation


“Correct-Individually”, C in ACID, each operation ensures application-level invariants

15

  • create-prescription


Create prescription for patient, doctor, pharmacy

  • update-prescription-medication


Add or increase medication to prescription

  • process-prescription


Deliver a medication by a pharmacy

  • get-*-prescriptions


Query functions to return information about prescriptions

slide-37
SLIDE 37

FMKe Invariants

  • Relative order [referential integrity]


Create a prescription and reference it by a patient

16

slide-38
SLIDE 38

FMKe Invariants

  • Relative order [referential integrity]


Create a prescription and reference it by a patient

  • Joint update [atomicity]


Create prescription, then update doctor, patient, and pharmacy

16

slide-39
SLIDE 39

FMKe Invariants

  • Relative order [referential integrity]


Create a prescription and reference it by a patient

  • Joint update [atomicity]


Create prescription, then update doctor, patient, and pharmacy

  • Precondition check [if, then]


Medication should not be over delivered

16

slide-40
SLIDE 40

Invariants AP-compatible

17

slide-41
SLIDE 41

AP-compatible

  • No synchronization


Updates occur locally without blocking, no synchronization in the critical path

18

slide-42
SLIDE 42

AP-compatible

  • No synchronization


Updates occur locally without blocking, no synchronization in the critical path

  • Asynchronous operation


Updates are fast, available, and exploit concurrency

18

slide-43
SLIDE 43

AP-compatible

  • No synchronization


Updates occur locally without blocking, no synchronization in the critical path

  • Asynchronous operation


Updates are fast, available, and exploit concurrency

  • Compatible invariants


Relative order and joint update invariants can be preserved

18

slide-44
SLIDE 44

AP-compatibe Data Model

19

slide-45
SLIDE 45

RA RB

slide-46
SLIDE 46

RA RB 1 set(1)

slide-47
SLIDE 47

RA RB 1 set(1) 3 2 set(2) set(3)

slide-48
SLIDE 48

RA RB 1 set(1) 3 2 set(2) set(3) 2 3

Concurrent assignments don’t commute!

slide-49
SLIDE 49

RA RB 1 set(1) 3 2 set(2) set(3) 2 3

Concurrent assignments don’t commute!

Assignment requires CP .

slide-50
SLIDE 50

24

Can we find a suitable data model for AP systems?

slide-51
SLIDE 51

Can we make non-commutative updates commutative?

24

Can we find a suitable data model for AP systems?

slide-52
SLIDE 52

RA RB 1 set(1) 3 2 set(2) set(3) ? ?

How do we deterministically pick a value to keep?

slide-53
SLIDE 53

RA RB 1 set(1) 3 2 set(2) set(3) ? ?

How do we deterministically pick a value to keep? Do we use a timestamp? (like Cassandra, and drop a value?)

slide-54
SLIDE 54

RA RB 1 set(1) 3 2 set(2) set(3) ? ?

How do we deterministically pick a value to keep? Do we use a timestamp? (like Cassandra, and drop a value?)

Timestamps make concurrent

  • perations commute

but fail to capture intent.

slide-55
SLIDE 55

Can we be smarter about the merge function?

26

slide-56
SLIDE 56

RA RB 1 set(1) 3 2 set(2) set(3) 3 3 max(2,3) max(2,3)

Deterministic conflict resolution function.

slide-57
SLIDE 57

RA RB 1 set(1) 3 2 set(2) set(3) 3 3 max(2,3) max(2,3)

Deterministic conflict resolution function. CRDTs generalize this framework.

slide-58
SLIDE 58

Conflict-Free 
 Replicated Data Types

  • Replicated abstract data types


Extension of sequential data type that encapsulates deterministic merge function

28

slide-59
SLIDE 59

Conflict-Free 
 Replicated Data Types

  • Replicated abstract data types


Extension of sequential data type that encapsulates deterministic merge function

  • Many existing designs


Sets, counters, registers, flags, maps

28

slide-60
SLIDE 60

AP-compatibe Relative Order

29

slide-61
SLIDE 61

RA RB

slide-62
SLIDE 62

RA RB

Maintain program order implication invariant.

slide-63
SLIDE 63

RA RB

Maintain program order implication invariant. For instance, P => Q.

slide-64
SLIDE 64

RA RB Q true(Q)

Make Q true.

slide-65
SLIDE 65

RA RB Q true(Q) P true(P)

Make P true.

slide-66
SLIDE 66

RA RB Q true(Q) P true(P)

Program order implies ordering relationship.

slide-67
SLIDE 67

RA RB Q true(Q) P true(P)

Ordering is respected at other replicas.

slide-68
SLIDE 68

RA RB Q true(Q) P true(P)

Out of order propagation violates invariant!

slide-69
SLIDE 69

RA RB Q true(Q) P true(P)

P is true, Q is NOT true!

slide-70
SLIDE 70

Let’s look at a concrete example.

37

slide-71
SLIDE 71

RA RB

slide-72
SLIDE 72

RA RB Q true(Q)

Change default administrator password.

slide-73
SLIDE 73

RA RB Q true(Q) P true(P)

Enable administrator login.

slide-74
SLIDE 74

RA RB Q true(Q) P true(P)

Replica A is secure.

slide-75
SLIDE 75

RA RB Q true(Q) P true(P)

Replica B is secure.

slide-76
SLIDE 76

RA RB Q true(Q) P true(P)

Reordering allows default password to be used to login!

slide-77
SLIDE 77

Causal Consistency

  • Respect causality


Ensure updates are delivered in the causal order
 [Lamport 78]

44

slide-78
SLIDE 78

Causal Consistency

  • Respect causality


Ensure updates are delivered in the causal order
 [Lamport 78]

  • Strongest available model


Always able to return some compatible version for an object

44

slide-79
SLIDE 79

Causal Consistency

  • Respect causality


Ensure updates are delivered in the causal order
 [Lamport 78]

  • Strongest available model


Always able to return some compatible version for an object

  • Referential integrity


Causal consistency is sufficient for providing referential integrity in an AP database

44

slide-80
SLIDE 80

…relative order invariants are preserved transparently!

45

Causal consistency means…

slide-81
SLIDE 81

AP-compatibe Joint Update

46

slide-82
SLIDE 82

RA RB C1

Client performing reads.

slide-83
SLIDE 83

RA RB C1 Rx create Rx

Create prescription.

slide-84
SLIDE 84

RA RB C1 Rx create Rx Dr update Dr(Rx)

Add reference in doctor record.

slide-85
SLIDE 85

RA RB C1 Rx create Rx Dr update Dr(Rx) Pt update Pt(Rx)

Add reference in patient record.

slide-86
SLIDE 86

RA RB C1 Rx create Rx Dr update Dr(Rx) Pt update Pt(Rx) Ph update Ph(Rx)

Add reference in pharmacy record.

slide-87
SLIDE 87

RA RB C1 Rx create Rx Dr update Dr(Rx) Pt update Pt(Rx) Ph update Ph(Rx)

Updates are causally consistent.

slide-88
SLIDE 88

RA RB C1 Rx create Rx Dr update Dr(Rx) Pt update Pt(Rx) Ph update Ph(Rx)

Client can read inconsistent state.

slide-89
SLIDE 89

RA RB C1 Rx create Rx Dr update Dr(Rx) Pt update Pt(Rx) Ph update Ph(Rx)

Client is missing update to pharmacy.

slide-90
SLIDE 90

Can we ensure updates are All-or-Nothing?

55

slide-91
SLIDE 91

RA RB C1 T1 create Rx update Dr(Rx) update Pt(Rx) update Ph(Rx)

Group updates into an atomic transaction.

slide-92
SLIDE 92

RA RB C1 T1 create Rx update Dr(Rx) update Pt(Rx) update Ph(Rx)

Updates reflect “All-Or-Nothing” property through snapshots.

slide-93
SLIDE 93

RA RB C1 T1 create Rx update Dr(Rx) update Pt(Rx) update Ph(Rx) T2

Transactions are delivered in causal order.

slide-94
SLIDE 94

RA RB C1 T1 create Rx update Dr(Rx) update Pt(Rx) update Ph(Rx) T2

Therefore, snapshots are causally consistent.

slide-95
SLIDE 95

AP-compatible transactions provide the “A” in ACID

60

slide-96
SLIDE 96

Transactional Causal Consistency

61

Strongest model that is available (AP)

slide-97
SLIDE 97

Invariants CAP-sensitive

62

slide-98
SLIDE 98

What about preventing over delivery of prescriptions?

63

slide-99
SLIDE 99

RA(2) RB(2) ? ? RC(2) ?

Three replicas each with two available medications.

slide-100
SLIDE 100

RA(2) RB(2) 1 1 1 pp(1) RC(2) 1

Replica A checks precondition and delivers medication.

slide-101
SLIDE 101

RA(2) RB(2) 1 1 1 pp(1) RC(2) 1

Correct outcome where one medication remains.

slide-102
SLIDE 102

Is this safe with concurrent operations?

67

slide-103
SLIDE 103

RA(2) RB(2) ? ? RC(2) ?

Three replicas each with two available medications.

slide-104
SLIDE 104

RA(2) RB(2) 4 4 1 pp(1) RC(2) 4 4 add(3)

Replica A checks precondition and delivers medication.

slide-105
SLIDE 105

RA(2) RB(2) 4 4 1 pp(1) RC(2) 4 4 add(3)

Replica C adds three medications to the prescription.

slide-106
SLIDE 106

RA(2) RB(2) 4 4 1 pp(1) RC(2) 4 4 add(3)

Correct outcome with four remaining medications.

slide-107
SLIDE 107

RA(2) RB(2) 4 4 1 pp(1) RC(2) 4 4 add(3)

Correct outcome with four remaining medications.

Precondition is stable under concurrent addition.

slide-108
SLIDE 108

Is this safe with concurrent deliveries?

72

slide-109
SLIDE 109

RA(2) RB(2) ? ? RC(2) ?

Three replicas each with two available medications.

slide-110
SLIDE 110

RA(2) RB(2)

  • 1
  • 1

1 pp(1) RC(2)

  • 1

pp(2)

Replica A checks precondition and delivers medication.

slide-111
SLIDE 111

RA(2) RB(2)

  • 1
  • 1

1 pp(1) RC(2)

  • 1

pp(2)

Replica C concurrently checks precondition and delivers two medications.

slide-112
SLIDE 112

RA(2) RB(2)

  • 1
  • 1

1 pp(1) RC(2)

  • 1

pp(2)

Incorrect outcome violating non-negative invariant.

slide-113
SLIDE 113

RA(2) RB(2)

  • 1
  • 1

1 pp(1) RC(2)

  • 1

pp(2)

Incorrect outcome violating non-negative invariant.

Precondition is NOT stable under concurrent fulfillment.

slide-114
SLIDE 114

RA(2) RB(2)

  • 1
  • 1

1 pp(1) RC(2)

  • 1

pp(2)

Incorrect outcome violating non-negative invariant.

Precondition is NOT stable under concurrent fulfillment.

  • Forbid concurrency


Prevent operations from proceeding without synchronization to enforce invariant

  • Allow concurrency and remove invariant


Allow operation to proceed, knowing that the invariant may be violated under concurrent operations

slide-115
SLIDE 115

How do we know when it’s safe?

77

slide-116
SLIDE 116

CISE Analysis

78

slide-117
SLIDE 117

RA RB I? I? ? Upre? RC I? ? Vpre?

Analyze possible pairs

  • f concurrent operations…
slide-118
SLIDE 118

RA RB I? I? ? Upre? RC I? ? Vpre?

…to identify operations where the invariant can be violated.

slide-119
SLIDE 119

CISE Analysis

  • Individually correct


Individual operations never violate the invariant

81

slide-120
SLIDE 120

CISE Analysis

  • Individually correct


Individual operations never violate the invariant

  • Convergence


Concurrent effects commute

81

slide-121
SLIDE 121

CISE Analysis

  • Individually correct


Individual operations never violate the invariant

  • Convergence


Concurrent effects commute

  • Precondition stability


Preconditions are stable under every pair

  • f concurrent operations

81

slide-122
SLIDE 122

CISE Analysis

  • Individually correct


Individual operations never violate the invariant

  • Convergence


Concurrent effects commute

  • Precondition stability


Preconditions are stable under every pair

  • f concurrent operations

81

If satisfied, invariant is guaranteed with concurrency.

slide-123
SLIDE 123

Database AntidoteDB

82

slide-124
SLIDE 124

AntidoteDB

  • Open-source Erlang database


Developed in Erlang, on top of the Riak Core distributed systems framework

83

slide-125
SLIDE 125

AntidoteDB

  • Open-source Erlang database


Developed in Erlang, on top of the Riak Core distributed systems framework

  • Transactional Causal Consistency


Only industrial-grade database providing both causal consistency and all-or-nothing transactions

83

slide-126
SLIDE 126

AntidoteDB

  • Open-source Erlang database


Developed in Erlang, on top of the Riak Core distributed systems framework

  • Transactional Causal Consistency


Only industrial-grade database providing both causal consistency and all-or-nothing transactions

  • Alpha release available


Currently under development, but an alpha release of the product is available on GitHub

83

slide-127
SLIDE 127

A B N1 N2 TxnMgr Materializer Log InterDC-Repl

Each data center…

slide-128
SLIDE 128

A B N1 N2 TxnMgr Materializer Log InterDC-Repl

…contains multiple nodes…

slide-129
SLIDE 129

A B N1 N2 TxnMgr Materializer Log InterDC-Repl

…each operating a transaction manager, materializers, log.

slide-130
SLIDE 130

A B N1 N2 TxnMgr Materializer Log InterDC-Repl

Strong consistency inside of the data center…

slide-131
SLIDE 131

A B N1 N2 TxnMgr Materializer Log InterDC-Repl

…with a causal consistency protocol running in the wide area.

slide-132
SLIDE 132

Data Model

89

Register

  • Last-Writer Wins
  • Multi-Value

Set

  • Grow-Only
  • Add-Wins
  • Remove-Wins

Map Counter

  • Unlimited
  • Restricted ≥ 0

Graph

  • Directed
  • Monotonic DAG
  • Edit graph

Sequence

slide-133
SLIDE 133

Object API

90

User1 = {michel, antidote_crdt_mvreg, user_bucket}, {ok, Time2} = antidote:update_objects(ignore, [], [{User1, assign, {["Michel", “michel@blub.org”], ClientIdentifier}}]), {ok, Result, Time2} = antidote:read_objects( ignore, [], [User1]).

slide-134
SLIDE 134

Object API

91

User1 = {michel, antidote_crdt_mvreg, user_bucket}, {ok, Time2} = antidote:update_objects(ignore, [], [{User1, assign, {["Michel", “michel@blub.org”], ClientIdentifier}}]), {ok, Result, Time2} = antidote:read_objects( ignore, [], [User1]).

Identify an object by object identifier.

slide-135
SLIDE 135

Object API

92

User1 = {michel, antidote_crdt_mvreg, user_bucket}, {ok, Time2} = antidote:update_objects(ignore, [], [{User1, assign, {["Michel", “michel@blub.org”], ClientIdentifier}}]), {ok, Result, Time2} = antidote:read_objects( ignore, [], [User1]).

Use the update API to assign a value to this register.

slide-136
SLIDE 136

Object API

93

User1 = {michel, antidote_crdt_mvreg, user_bucket}, {ok, Time2} = antidote:update_objects(ignore, [], [{User1, assign, {["Michel", “michel@blub.org”], ClientIdentifier}}]), {ok, Result, Time2} = antidote:read_objects( ignore, [], [User1]).

Read the object, providing a minimum snapshot time.

slide-137
SLIDE 137

Object API

93

User1 = {michel, antidote_crdt_mvreg, user_bucket}, {ok, Time2} = antidote:update_objects(ignore, [], [{User1, assign, {["Michel", “michel@blub.org”], ClientIdentifier}}]), {ok, Result, Time2} = antidote:read_objects( ignore, [], [User1]).

Read the object, providing a minimum snapshot time.

Simple, operation-based API. (think Redis, Riak CRDTs)

slide-138
SLIDE 138

Object API

93

User1 = {michel, antidote_crdt_mvreg, user_bucket}, {ok, Time2} = antidote:update_objects(ignore, [], [{User1, assign, {["Michel", “michel@blub.org”], ClientIdentifier}}]), {ok, Result, Time2} = antidote:read_objects( ignore, [], [User1]).

Read the object, providing a minimum snapshot time.

Simple, operation-based API. (think Redis, Riak CRDTs) Causal dependencies are automatically captured by execution order.

slide-139
SLIDE 139

Transaction API

94

{ok, TxId} = antidote:start_transaction(Timestamp, []), {ok, _} = antidote:read_objects([Set], TxId),

  • k = antidote:update_objects([{Set, add, "Java"}], TxId),

{ok, _} = antidote:commit_transaction(TxId).

slide-140
SLIDE 140

Transaction API

95

Start a transaction with the transaction API, with a given snapshot time and return a transaction identifier.

{ok, TxId} = antidote:start_transaction(Timestamp, []), {ok, _} = antidote:read_objects([Set], TxId),

  • k = antidote:update_objects([{Set, add, "Java"}], TxId),

{ok, _} = antidote:commit_transaction(TxId).

slide-141
SLIDE 141

{ok, TxId} = antidote:start_transaction(Timestamp, []), {ok, _} = antidote:read_objects([Set], TxId),

  • k = antidote:update_objects([{Set, add, "Java"}], TxId),

{ok, _} = antidote:commit_transaction(TxId).

Transaction API

96

Read objects using the interactive transaction API.

slide-142
SLIDE 142

{ok, TxId} = antidote:start_transaction(Timestamp, []), {ok, _} = antidote:read_objects([Set], TxId),

  • k = antidote:update_objects([{Set, add, "Java"}], TxId),

{ok, _} = antidote:commit_transaction(TxId).

Transaction API

97

Update objects using the interactive transaction API.

slide-143
SLIDE 143

{ok, TxId} = antidote:start_transaction(Timestamp, []), {ok, _} = antidote:read_objects([Set], TxId),

  • k = antidote:update_objects([{Set, add, "Java"}], TxId),

{ok, _} = antidote:commit_transaction(TxId).

Transaction API

98

Once finished updating, commit the transaction.

slide-144
SLIDE 144

{ok, TxId} = antidote:start_transaction(Timestamp, []), {ok, _} = antidote:read_objects([Set], TxId),

  • k = antidote:update_objects([{Set, add, "Java"}], TxId),

{ok, _} = antidote:commit_transaction(TxId).

Transaction API

98

Once finished updating, commit the transaction.

Transactions read causally consistent snapshots and updates are applied atomically.

slide-145
SLIDE 145

Scalability

99

Kops / s 100 200 300 400 500 600 700 800 1 x 5 1 x 10 1 x 25 2 x 25 3 x 25 1 x 5 1 x 10 1 x 25 2 x 25 3 x 25 1 x 5 1 x 10 1 x 25 2 x 25 3 x 25 1 x 5 1 x 10 1 x 25 2 x 25 3 x 25

99(1) 90(10) 75(25) 50(50)

read(update) ratio DCs × Servers LWW registers 100k keys/partition power law distribution

slide-146
SLIDE 146

Cure vs. SOA

100

Kops / s

100 200 300 400 500 600 700 800 900 1000 1100 Eiger GR Cure EC Eiger GR Cure EC Eiger GR Cure EC Eiger GR Cure EC 99(1) 90(10) 75(25) 50(50) read(update) ratio

3 DCs × 25 Servers LWW registers

slide-147
SLIDE 147

Cure vs. EC

101

Kops / s 100 200 300 400 500 600 700 800 900 1000 1100 1200 Cure, 1KB EC, 1KB Cure, 10KB EC, 10KB Cure, 1KB EC, 1KB Cure, 10KB EC, 10KB Cure, 1KB EC, 1KB Cure, 10KB EC, 10KB Cure, 1KB EC, 1KB Cure, 10KB EC, 10KB 99(1) 90(10) 75(25) 50(50) read(update) ratio

3 DCs x 25 Servers CRDT sets

slide-148
SLIDE 148

Future Features

  • Intra-DC replication


Antidote provides no replication within the datacenter and assumes only geo- replication at the moment

102

slide-149
SLIDE 149

Future Features

  • Intra-DC replication


Antidote provides no replication within the datacenter and assumes only geo- replication at the moment

  • ACID transactions


For Antidote to provide all of JRC, it needs ACID transaction support: no research needed, only implementation

102

slide-150
SLIDE 150

Moving Forward

  • Research prototype


Originally a research prototype to build a database requiring reduced synchronization (SyncFree FP7) with Basho, Rovio, and Trifork

103

slide-151
SLIDE 151

Moving Forward

  • Research prototype


Originally a research prototype to build a database requiring reduced synchronization (SyncFree FP7) with Basho, Rovio, and Trifork

  • Research ahead


LightKone (H2020) will investigate moving AntidoteDB close to the edge to provide DDN services

103

slide-152
SLIDE 152

Moving Forward

  • Research prototype


Originally a research prototype to build a database requiring reduced synchronization (SyncFree FP7) with Basho, Rovio, and Trifork

  • Research ahead


LightKone (H2020) will investigate moving AntidoteDB close to the edge to provide DDN services

  • Industrialization


Obtaining seed funding to start a company to industrialize AntidoteDB

103

slide-153
SLIDE 153

Resources

  • https://github.com/SyncFree/antidote


AntidoteDB

104

slide-154
SLIDE 154

Resources

  • https://github.com/SyncFree/antidote


AntidoteDB

  • http://syncfree.github.io/antidote/


Documentation for AntidoteDB

104

slide-155
SLIDE 155

Resources

  • https://github.com/SyncFree/antidote


AntidoteDB

  • http://syncfree.github.io/antidote/


Documentation for AntidoteDB

  • www.antidotedb.com


Website

104

slide-156
SLIDE 156

Resources

  • https://github.com/SyncFree/antidote


AntidoteDB

  • http://syncfree.github.io/antidote/


Documentation for AntidoteDB

  • www.antidotedb.com


Website

  • docker pull antidotedb/antidote


Try out Antidote!

104

slide-157
SLIDE 157

Thanks!

105

More questions? Come visit us at the Evolution bar!