understanding tradeoffs for scalability
play

Understanding Tradeoffs for Scalability Steve Vinoski Architect, - PowerPoint PPT Presentation

Understanding Tradeoffs for Scalability Steve Vinoski Architect, Basho Technologies Cambridge, MA USA @stevevinoski Wednesday, October 12, 11 1 Back In the Old Days Big centralized servers controlled all storage To scale, you


  1. Understanding Tradeoffs for Scalability Steve Vinoski Architect, Basho Technologies Cambridge, MA USA @stevevinoski Wednesday, October 12, 11 1

  2. Back In the Old Days • Big centralized servers controlled all storage • To scale, you scaled vertically (up) by getting a bigger server • Single host guaranteed data consistency Wednesday, October 12, 11 2

  3. Drawbacks • Scaling up is limited • Servers can only get so big • And the bigger they get, the more they cost Wednesday, October 12, 11 3

  4. Hitting the Wall • Websites started outgrowing the scale-up approach • Started applying workarounds to try to scale • Resulted in fragile systems with di ffj cult operational challenges Wednesday, October 12, 11 4

  5. A Distributed Approach • Multiple commodity servers • Scale horizontally (out instead of up) • Read and write on any server • Replicated data • Losing a server doesn’t lose data Wednesday, October 12, 11 5

  6. No Magic Bullet • A distributed approach can scale much larger • But distribution brings its own set of issues • Requires tradeo fg s Wednesday, October 12, 11 6

  7. CAP Theorem • A conjecture put forth in 2000 by Dr. Eric Brewer • Formally proven in 2002 • In any distributed system, pick two: • Consistency • Availability • Partition tolerance Wednesday, October 12, 11 7

  8. Partition Tolerance • Guarantees continued system operation even when the network breaks and messages are lost • Systems generally tend to support P • Leaves choice of either C or A Wednesday, October 12, 11 8

  9. Consistency • Distributed nodes see the same updates at the same logical time • Hard to guarantee across a distributed system Wednesday, October 12, 11 9

  10. Availability • Guarantees the system will service every read and write sent to it • Even when things are breaking Wednesday, October 12, 11 10

  11. Choose Two: CA • Traditional single-node RDBMS • Single node means P irrelevant Wednesday, October 12, 11 11

  12. Choose Two: CP • Typically involves sharding, where data is spread across nodes in an app-specific manner • Sharding can be brittle • data unavailable from a given shard if its node dies • can be hard to add nodes and change the sharding logic Wednesday, October 12, 11 12

  13. Choose Two: AP • Provides read/write availability even when network breaks or nodes die • Provides eventual consistency • Example: Domain Name System (DNS) is an AP system Wednesday, October 12, 11 13

  14. Example AP Systems • Amazon Dynamo • Cassandra • CouchDB • Voldemort • Basho Riak Wednesday, October 12, 11 14

  15. Handling Tradeoffs for AP Systems Wednesday, October 12, 11 15

  16. • Problem: how to make the system available even if nodes die or the network breaks? • Solution: • allow reading and writing from multiple nodes in the system • avoid master nodes, instead make all nodes peers Wednesday, October 12, 11 16

  17. • Problem: if multiple nodes are involved, how do you reliably know where to read or write? • Solution: • assign virtual nodes (vnodes) to physical nodes • use consistent hashing to find vnodes for reads/writes Wednesday, October 12, 11 17

  18. Consistent Hashing Wednesday, October 12, 11 18

  19. Consistent Hashing and Multi Vnode Benefits • Data is stored in multiple locations • Loss of a node means only a single replica is lost • No master to lose • Adding nodes is trivial, data gets rebalanced automatically Wednesday, October 12, 11 19

  20. • Problem: what about availability? What if the node you write to dies or becomes inaccessible? • Solution: sloppy quorums • write to multiple vnodes • attempt reads from multiple vnodes Wednesday, October 12, 11 20

  21. N/R/W Values • N = number of replicas to store (on distinct nodes) • R = number of replica responses needed for a successful read (specified per-request) • W = number of replica responses needed for a successful write (specified per-request) Wednesday, October 12, 11 21

  22. N/R/W Values Wednesday, October 12, 11 22

  23. • Problem: what happens if a key hashes to vnodes that aren’t available? • Solution: • read from or write to the next available vnode • eventually repair via hinted hando fg Wednesday, October 12, 11 23

  24. N/R/W Values Wednesday, October 12, 11 24

  25. Hinted Handoff • Surrogate vnode holds data for unavailable actual vnode • Surrogate vnode keeps checking for availability of actual vnode • Once the actual vnode is again available, surrogate hands o fg data to it Wednesday, October 12, 11 25

  26. Quorum Benefits • Allows applications to tune consistency, availability, reliability per read or write Wednesday, October 12, 11 26

  27. • Problem: how do the nodes in the ring keep track of ring state? • Solution: gossip protocol Wednesday, October 12, 11 27

  28. Gossip Protocol • Nodes “gossip” their view of the state of the ring to other nodes • If a node changes its claim on the ring, it lets others know • The overall state of the ring is thus kept consistent among all nodes in the ring Wednesday, October 12, 11 28

  29. • Problem: what happens if vnodes get out of sync? • Solution: • vector clocks • read repair Wednesday, October 12, 11 29

  30. Vector Clocks • Reasoning about time and causality in distributed systems is hard • Integer timestamps don’t necessarily capture causality • Vector clocks provide a happens- before relationship between two events Wednesday, October 12, 11 30

  31. Vector Clocks • Simple data structure: [(ActorID,Counter)] • All data has an associated vector clock, actors update their entry when making changes • ClockA happened-before ClockB if all actor-counters in A are less than or equal to those in B Wednesday, October 12, 11 31

  32. Read Repair • If a read detects that a vnode has stale data, it is repaired via asynchronous update • Helps implement eventual consistency Wednesday, October 12, 11 32

  33. This is Riak Core • consistent • gossip hashing protocols • vector clocks • virtual nodes (vnodes) • sloppy • hinted hando fg quorums Wednesday, October 12, 11 33

  34. Conclusion • Scaling up is limited • But scaling out requires di fg erent tradeo fg s • CAP Theorem: pick two • AP systems use a variety of techniques to ensure availability and eventual consistency Wednesday, October 12, 11 34

  35. Thanks Wednesday, October 12, 11 35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend