Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY - - PowerPoint PPT Presentation

consistent storage or scalable storage why not both
SMART_READER_LITE
LIVE PREVIEW

Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY - - PowerPoint PPT Presentation

Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY Strong Consistency Eventual Consistency "Consistency in database systems refers to the requirement that any given database transaction must change affected data only in


slide-1
SLIDE 1

Consistent Storage

  • r Scalable Storage

– Why Not Both?

slide-2
SLIDE 2

CONSISTENCY

slide-3
SLIDE 3

Strong Consistency

slide-4
SLIDE 4

Eventual Consistency

slide-5
SLIDE 5

"Consistency in database systems refers to the requirement that any given database transaction must change affected data only in allowed ways." Wikipedia Consistency (database systems)

slide-6
SLIDE 6

PickTix Concert Tickets schema

» User ⋄ id ⋄ name » Concert ⋄ id ⋄ name ⋄ tickets_left » TicketOrder ⋄ id ⋄ user_id ⋄ concert_id ⋄ num_tickets

slide-7
SLIDE 7

Transactional Consistency

Begin transaction » Read concert.tickets_left » Create new invoice for 10 tickets » Write tickets_left minus 10 from previous value End transaction

slide-8
SLIDE 8

If I write X, then read X (from anywhere), it'll include that write.

Strong Consistency

If I write X, then read X, it might not have the update now, but eventually it'll have it.

Eventual Consistency

Write and read-write transactions across the database are atomic and isolated.

Transactional Consistency

slide-9
SLIDE 9

Apache Cassandra Built by Facebook Open sourced 2008 Arguably 2nd most popular Has schemas and SQL-like query language

CASE STUDIES

Spanner Built by Google Paper published 2012 Recently released as beta Schemas and SQL-like query language

slide-10
SLIDE 10

1.

Cassandra

Representative non-relational storage system

slide-11
SLIDE 11

Client Client Client Client Client Database

slide-12
SLIDE 12

Node Node Node Node Node Node Client Client Client Client Client Client Client Client Client Client Client Client

slide-13
SLIDE 13

PickTix Concert Tickets schema

» User ⋄ id ⋄ name » Concert ⋄ id ⋄ name ⋄ tickets_left » TicketOrder ⋄ id ⋄ user_id ⋄ concert_id ⋄ num_tickets

slide-14
SLIDE 14

Primary key concert_ id (Partition key) ticket_order_id user_id num_tickets

'adele' 1 'alice' 4 'adele' 2 'bob' 5 'gaga' 3 'alice' 1 'gaga' 4 'fred' 43

Partition: Ticket Orders

slide-15
SLIDE 15

Primary key concert_ id (Partition key) ticket_order_id user_id num_tickets

'adele' 1 'alice' 4 'adele' 2 'bob' 5 'gaga' 3 'alice' 1 'gaga' 4 'fred' 43

Partition: Ticket Orders

slide-16
SLIDE 16

Node Node Node Node Node

slide-17
SLIDE 17

Node Node Node Node Node

slide-18
SLIDE 18

Write Consistency Level: All

Node Node Node Node Node Node

slide-19
SLIDE 19

Write Consistency Level: One

Node Node Node Node Node Node

? ?

slide-20
SLIDE 20

Write Consistency Level: Quorum

Node Node Node Node Node Node

  • ?
slide-21
SLIDE 21

NW + NR > N

slide-22
SLIDE 22

Global Replication

slide-23
SLIDE 23

Global replication

Node Node Node Node Node Node Node Node Node Node Node Node Node

slide-24
SLIDE 24

Yes, iff (W + R > N) is satisfied.

Strong Consistency

Yes.

Eventual Consistency

Limited operations within partitions.

Transactional Consistency

slide-25
SLIDE 25

Alice

Add pending friend request from Bob Add Alice to friends Delete pending friend request

Bob

Add pending friend request to Alice Check pending friend request Add Bob to friends Delete pending friend request

slide-26
SLIDE 26

Alice

Add pending friend request from Bob Add Alice to friends Delete pending friend request

Bob

Add pending friend request to Alice Check pending friend request Cancel pending friend request Add Bob to friends Delete pending friend request

slide-27
SLIDE 27

Development Costs

» Choose partition keys wisely ⋄ Include any data which must be kept consistent with it ⋄ Don't let it get too big » Duplicate (denormalise) data » Background cleanup tasks

slide-28
SLIDE 28

2.

Spanner

Representative scalable relational storage system

slide-29
SLIDE 29

Can I use Spanner now?

It works! It Scales! It's battle-tested! It's ready to be used, except: » Google Cloud Platform only » Beta (no SLA) » Single region only » Expensive

slide-30
SLIDE 30

@jlawrence124 /r/wallpaper/

slide-31
SLIDE 31

Read-write/write-write consistency

Alice wants to accept a pending friend request from Bob 1. Check that the friend request is still valid 2. Add Alice as a friend to Bob 3. Add Bob as a friend to Alice No risk of Bob cancelling the friend reqest between step 1 and 2/3

slide-32
SLIDE 32

The consistency guarantees we want

Write, write-write and read-write transactions » Atomic » Isolated Read and Read-read transactions » Never see partial writes » If writes depend on each other, never see them

  • ut of order
slide-33
SLIDE 33

Linearizabilty

T1 T2 T1 < T2

slide-34
SLIDE 34

Cassandra Write Timestamps

Node Node Node Client A Client B T1 T1 T1 T1 T2 T2 T2 T2

slide-35
SLIDE 35

Clock drift

A's clock is slightly ahead B's clock is slightly behind A writes with timestamp T1 (Client A generated timestamp) B reads at timestamp T1 B writes at timestamp T2 (Client B generated timestamp) T2 < T1 so B's write is before A's

slide-36
SLIDE 36

TrueTime

GPS Master Atomic Clock Master GPS Master Atomic Clock Master GPS Master Node Current time t Uncertainty ϵ Synchronise to time t Uncertainty ϵ = ϵ + network latency Increase ϵ over time

slide-37
SLIDE 37

TrueTime

TT.now() = [earliest, latest]

slide-38
SLIDE 38

Linearizabilty with TrueTime

T1 T2 1. Transaction starts 2. Assign transaction timestamp T1 to be TT.latest 3. Prepare transaction 4. Wait for T1 to be earlier than TT.earliest 5. Commit transaction 6. Return success

slide-39
SLIDE 39

Linearizabilty

T1 T2 T1 ? T2

slide-40
SLIDE 40

Spanner Partitions (tablets)

slide-41
SLIDE 41
slide-42
SLIDE 42

Node Node Node Node Node

Cassandra partition replication

slide-43
SLIDE 43

Spanserver Spanserver Spanserver Spanserver Spanserver Colossus

slide-44
SLIDE 44

Zone Zone Zone Spanserver Spanserver Spanserver Spanserver Spanserver Spanserver Spanserver Spanserver Spanserver Spanserver Spanserver Spanserver Location Proxy Location Proxy Location Proxy Location Proxy Location Proxy Location Proxy Zone Master Zone Master Zone Master Colossus Colossus Colossus

slide-45
SLIDE 45

Spanserver Spanserver Spanserver Spanserver Spanserver

slide-46
SLIDE 46

The consistency guarantees we want

Write and read-write transactions » Atomic » Isolated Read-read transactions » Never see partial writes » If writes depend on each other, never see them

  • ut of order
slide-47
SLIDE 47

Transactions that conflict

T1 T2

slide-48
SLIDE 48

Spanserver Spanserver Spanserver Spanserver Spanserver

slide-49
SLIDE 49

Transactions that conflict

T1 T2

slide-50
SLIDE 50

Transactions across paxos groups (tablets)

Spanserver

  • Spanserver
  • Spanserver
slide-51
SLIDE 51

The consistency guarantees we want

Write and read-write transactions » Atomic » Isolated Read and Read-read transactions » Never see partial writes » If writes depend on each other, never see them

  • ut of order
slide-52
SLIDE 52

Reads

» Consistent reads at a timestamp » Strongly consistent reads » Time-bounded staleness reads

slide-53
SLIDE 53

Conclusions

» Consistency guarantees make happy developers. » Transactional consistency at scale is feasible. » Perfect for high-read, low-write, consistency-sensitive data. » Consider using Spanner, if it works for you. » Keep a look out for the next generation! Or build it!

slide-54
SLIDE 54

THANKS!

Any questions?

I have copies of the Spanner, Cassandra and related papers here. You can find me at » katiebell.net » @notsolonecoder