Understanding Tradeoffs for Scalability Steve Vinoski Architect, - - PowerPoint PPT Presentation

understanding tradeoffs for scalability
SMART_READER_LITE
LIVE PREVIEW

Understanding Tradeoffs for Scalability Steve Vinoski Architect, - - PowerPoint PPT Presentation

Understanding Tradeoffs for Scalability Steve Vinoski Architect, Basho Technologies Cambridge, MA USA @stevevinoski Wednesday, October 12, 11 1 Back In the Old Days Big centralized servers controlled all storage To scale, you


slide-1
SLIDE 1

Understanding Tradeoffs for Scalability

Steve Vinoski

Architect, Basho Technologies Cambridge, MA USA @stevevinoski

1 Wednesday, October 12, 11

slide-2
SLIDE 2

Back In the Old Days

  • Big centralized servers controlled all

storage

  • To scale, you scaled vertically (up)

by getting a bigger server

  • Single host guaranteed data

consistency

2 Wednesday, October 12, 11

slide-3
SLIDE 3

Drawbacks

  • Scaling up is limited
  • Servers can only get so big
  • And the bigger they get, the more

they cost

3 Wednesday, October 12, 11

slide-4
SLIDE 4

Hitting the Wall

  • Websites started outgrowing the

scale-up approach

  • Started applying workarounds to try

to scale

  • Resulted in fragile systems with

diffjcult operational challenges

4 Wednesday, October 12, 11

slide-5
SLIDE 5

A Distributed Approach

  • Multiple commodity servers
  • Scale horizontally (out instead of

up)

  • Read and write on any server
  • Replicated data
  • Losing a server doesn’t lose data

5 Wednesday, October 12, 11

slide-6
SLIDE 6

No Magic Bullet

  • A distributed approach can scale

much larger

  • But distribution brings its own set
  • f issues
  • Requires tradeofgs

6 Wednesday, October 12, 11

slide-7
SLIDE 7

CAP Theorem

  • A conjecture put forth in 2000 by Dr.

Eric Brewer

  • Formally proven in 2002
  • In any distributed system, pick two:
  • Consistency
  • Availability
  • Partition tolerance

7 Wednesday, October 12, 11

slide-8
SLIDE 8

Partition Tolerance

  • Guarantees continued system
  • peration even when the network

breaks and messages are lost

  • Systems generally tend to support P
  • Leaves choice of either C or A

8 Wednesday, October 12, 11

slide-9
SLIDE 9

Consistency

  • Distributed nodes see the same

updates at the same logical time

  • Hard to guarantee across a

distributed system

9 Wednesday, October 12, 11

slide-10
SLIDE 10

Availability

  • Guarantees the system will service

every read and write sent to it

  • Even when things are breaking

10 Wednesday, October 12, 11

slide-11
SLIDE 11

Choose Two: CA

  • Traditional single-node RDBMS
  • Single node means P irrelevant

11 Wednesday, October 12, 11

slide-12
SLIDE 12

Choose Two: CP

  • Typically involves sharding, where

data is spread across nodes in an app-specific manner

  • Sharding can be brittle
  • data unavailable from a given shard

if its node dies

  • can be hard to add nodes and

change the sharding logic

12 Wednesday, October 12, 11

slide-13
SLIDE 13

Choose Two: AP

  • Provides read/write availability even

when network breaks or nodes die

  • Provides eventual consistency
  • Example: Domain Name System

(DNS) is an AP system

13 Wednesday, October 12, 11

slide-14
SLIDE 14

Example AP Systems

  • Amazon Dynamo
  • Cassandra
  • CouchDB
  • Voldemort
  • Basho Riak

14 Wednesday, October 12, 11

slide-15
SLIDE 15

Handling Tradeoffs for AP Systems

15 Wednesday, October 12, 11

slide-16
SLIDE 16
  • Problem: how to make the system

available even if nodes die or the network breaks?

  • Solution:
  • allow reading and writing from

multiple nodes in the system

  • avoid master nodes, instead make all

nodes peers

16 Wednesday, October 12, 11

slide-17
SLIDE 17
  • Problem: if multiple nodes are

involved, how do you reliably know where to read or write?

  • Solution:
  • assign virtual nodes (vnodes) to

physical nodes

  • use consistent hashing to find vnodes

for reads/writes

17 Wednesday, October 12, 11

slide-18
SLIDE 18

Consistent Hashing

18 Wednesday, October 12, 11

slide-19
SLIDE 19

Consistent Hashing and Multi Vnode Benefits

  • Data is stored in multiple locations
  • Loss of a node means only a single

replica is lost

  • No master to lose
  • Adding nodes is trivial, data gets

rebalanced automatically

19 Wednesday, October 12, 11

slide-20
SLIDE 20
  • Problem: what about availability?

What if the node you write to dies or becomes inaccessible?

  • Solution: sloppy quorums
  • write to multiple vnodes
  • attempt reads from multiple vnodes

20 Wednesday, October 12, 11

slide-21
SLIDE 21

N/R/W Values

  • N = number of replicas to store (on

distinct nodes)

  • R = number of replica responses

needed for a successful read (specified per-request)

  • W = number of replica responses

needed for a successful write (specified per-request)

21 Wednesday, October 12, 11

slide-22
SLIDE 22

N/R/W Values

22 Wednesday, October 12, 11

slide-23
SLIDE 23
  • Problem: what happens if a key

hashes to vnodes that aren’t available?

  • Solution:
  • read from or write to the next

available vnode

  • eventually repair via hinted handofg

23 Wednesday, October 12, 11

slide-24
SLIDE 24

N/R/W Values

24 Wednesday, October 12, 11

slide-25
SLIDE 25

Hinted Handoff

  • Surrogate vnode holds data for

unavailable actual vnode

  • Surrogate vnode keeps checking for

availability of actual vnode

  • Once the actual vnode is again

available, surrogate hands ofg data to it

25 Wednesday, October 12, 11

slide-26
SLIDE 26

Quorum Benefits

  • Allows applications to tune

consistency, availability, reliability per read or write

26 Wednesday, October 12, 11

slide-27
SLIDE 27
  • Problem: how do the nodes in the

ring keep track of ring state?

  • Solution: gossip protocol

27 Wednesday, October 12, 11

slide-28
SLIDE 28
  • Nodes “gossip” their view of the

state of the ring to other nodes

  • If a node changes its claim on the

ring, it lets others know

  • The overall state of the ring is thus

kept consistent among all nodes in the ring

Gossip Protocol

28 Wednesday, October 12, 11

slide-29
SLIDE 29
  • Problem: what happens if vnodes

get out of sync?

  • Solution:
  • vector clocks
  • read repair

29 Wednesday, October 12, 11

slide-30
SLIDE 30

Vector Clocks

  • Reasoning about time and causality

in distributed systems is hard

  • Integer timestamps don’t

necessarily capture causality

  • Vector clocks provide a happens-

before relationship between two events

30 Wednesday, October 12, 11

slide-31
SLIDE 31

Vector Clocks

  • Simple data structure:

[(ActorID,Counter)]

  • All data has an associated vector

clock, actors update their entry when making changes

  • ClockA happened-before ClockB if

all actor-counters in A are less than

  • r equal to those in B

31 Wednesday, October 12, 11

slide-32
SLIDE 32

Read Repair

  • If a read detects that a vnode has

stale data, it is repaired via asynchronous update

  • Helps implement eventual

consistency

32 Wednesday, October 12, 11

slide-33
SLIDE 33

This is Riak Core

  • consistent

hashing

  • vector clocks
  • sloppy

quorums

  • gossip

protocols

  • virtual nodes

(vnodes)

  • hinted handofg

33 Wednesday, October 12, 11

slide-34
SLIDE 34

Conclusion

  • Scaling up is limited
  • But scaling out requires difgerent

tradeofgs

  • CAP Theorem: pick two
  • AP systems use a variety of

techniques to ensure availability and eventual consistency

34 Wednesday, October 12, 11

slide-35
SLIDE 35

Thanks

35 Wednesday, October 12, 11