The Case for RAMClouds: Scalable High-Performance Storage Entirely - - PowerPoint PPT Presentation
The Case for RAMClouds: Scalable High-Performance Storage Entirely - - PowerPoint PPT Presentation
The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM Harel Cohen Tel Aviv University Outline Introduction RAMCloud Overview Motivation Research Issues RAMCloud Disadvantages Related Work
Outline
- Introduction
- RAMCloud Overview
- Motivation
- Research Issues
- RAMCloud Disadvantages
- Related Work
- Conclusion
- RAMCloud Project
- Acknowledgments
- References
May 26, 2013 RAMCloud Slide 2
Introduction
May 26, 2013 RAMCloud Slide 3
- DRAM usage limited/specialized
- Clumsy (consistency with
backing store)
- Lost performance (cache
misses, backing store)
1970 1980 1990 2000 2010
UNIX buffer cache Main-memory databases Large file caches Web indexes entirely in DRAM memcached Facebook: 200 TB total data 150 TB cache! Main-memory DBs, again
Introduction
May 26, 2013 RAMCloud Slide 4
RAMCloud – harness full performance potential of large-scale DRAM storage:
- General-purpose storage system
- All data always in DRAM (no cache misses)
- Durable and available (no backing store)
- Scale: 1000+ servers, 100+ TB
- Low latency: 5-10µs remote access (100 - 1000x lower than disk)
- High throughput (100 - 1000x higher than disk)
Potential impact: enable new class of applications Large-scale Web applications Richer query models that enable a new class of data-intensive Scalable storage substrate
RAMCloud Overview
May 26, 2013 RAMCloud Slide 5
- Storage for datacenters
- 1000-10000 commodity servers
- 32-64 GB DRAM/server
- All data always in RAM
- Scale automatically
- Durable and available
- 100 – 1000x better disk-based
storage systems
- Performance goals:
High throughput: 1M ops/sec/server Low-latency access: 5-10µs Application Servers Storage Servers Datacenter
RAMCloud Overview
May 26, 2013 RAMCloud Slide 6
2009 5-10 years # servers 1000 4000 GB/server 64GB 256GB Total capacity 64TB 1PB Total server cost $4M $5M $/GB $60 $5 For $100-200K today:
One year of Amazon customer orders One year of United flight reservations
RAMCloud Overview
May 26, 2013 RAMCloud Slide 7
Master Backup Master Backup Master Backup Master Backup
…
Appl. Library Appl. Library Appl. Library Appl. Library
…
Datacenter Network
Coordinator
1000 – 10,000 Storage Servers 1000 – 100,000 Application Servers
Motivation
- Relational databases don’t scale
- Every large-scale Web application has problems:
Facebook: 4000 MySQL instances + 2000 memcached servers
- Major system redesign for every 10x increase in scale
- New forms of storage appearing:
Bigtable Dynamo PNUTS
- RAMCloud may provide a general-purpose storage system that
scales far beyond existing systems
May 26, 2013 RAMCloud Slide 8
Application scalability
Motivation
Disk access rate not keeping up with capacity:
May 26, 2013 RAMCloud Slide 9
Technology trends
Mid-1980’s 2009 Change Disk capacity 30 MB 500 GB 16667x
- Max. transfer rate
2 MB/s 100 MB/s 50x Latency (seek & rotate) 20 ms 10 ms 2x Capacity/bandwidth (large blocks) 15 s 5000 s 333x Capacity/bandwidth (1KB blocks) 600 s 58 days 8333x Jim Gray's rule 5 min 30 hours 360x
Motivation
May 26, 2013 RAMCloud Slide 10
Technology trends
- It is not possible to access information on disk very frequently
- Disks must become more archival
- More information must move to memory
- Jim Gary’s Rule
Motivation
- Lost performance:
1% misses → 10x performance degradation Hard to approach 1% misses (Example: Facebook ~ 5-7% misses)
- Won’t save much money:
Already have to keep information in memory Example: Facebook caches ~75% of data size
- Changes disk management issues:
Optimize for reads, vs. writes & recovery
May 26, 2013 RAMCloud Slide 11
Caching
Motivation
May 26, 2013 RAMCloud Slide 12
Does latency matter?
- Large-scale apps struggle with high latency
Facebook: make average of 130 internal requests per page Amazon: 100 – 200 internal requests per page UI App. Logic
Data Structures
Traditional Application
<< 1µs latency
Single machine
UI Bus. Logic
Application Servers Storage Servers
Web Application
0.5 - 10ms latency
Datacenter
Motivation
May 26, 2013 RAMCloud Slide 13
Does latency matter?
- RAMCloud goal: large scale and low latency
- New “one size fits all” database
- Enable a new breed of information-intensive applications
UI App. Logic
Data Structures
Traditional Application
<< 1µs latency
Single machine
UI Bus. Logic
Application Servers Storage Servers
Web Application
0.5 - 10ms latency
Datacenter
5 - 10µs
Motivation
May 26, 2013 RAMCloud Slide 14
Use flash memory instead of DRAM?
- Many candidate technologies besides DRAM
Flash Memory (NAND, NOR) PCRAM (Phase-Change Memory)
- FlashCloud would be cheaper and consume
less energy than a DRAM-based approach
- DRAM enables lowest latency today:
5 - 10x faster than flash memory
- RAMCloud provide higher throughput than FlashCloud
- Most RAMCloud techniques will apply to other technologies
Motivation
May 26, 2013 RAMCloud Slide 15
Use flash memory instead of DRAM?
Motivation
- Facebook: 200 TB of (non-image) data in 2009
- Online Retailer – (Example : Amazon):
Revenues/year: ~ $16B Orders/year: ~ 400M ($40/order) Bytes/order: ~ 1000 - 10000 Order data/year: ~ 400 GB - 4.0 TB RAMCloud cost: ~ $24 - 240K
- Airline Reservations – (Example: United Airlines):
Total flights/day: ~ 4000 (30,000 for all airlines in U.S.) Passenger flights/year: ~ 200M Bytes/passenger-flight: ~ 1000 - 10000 Order data/year: ~ 220 GB - 2.2 TB RAMCloud cost: ~ $13 - 130K
Ready today for almost all online data; media soon…
May 26, 2013 RAMCloud Slide 16
RAMCloud applicability today
Research Issues
May 26, 2013 RAMCloud Slide 17
Some challenging research issues:
- Low latency RPC
- Durability and availability
- Data model
- Distribution and scaling
- Concurrency, transactions, and consistency
- Multi-tenancy
- Server-client functional distribution
- Self-management
Research Issues
May 26, 2013 RAMCloud Slide 18
Low latency RPC
Achieving 5 - 10µs will impact every layer of the system:
- Must reduce network latency:
Typical today: 300-500 µs roundtrip (10-30 µs/switch, 5 switches each way) Arista 7100S 10GB switch : 9 µs roundtrip (0.9 µs/switch)
Need cut-through routing
- Tailor OS on server side:
Dedicated cores No interrupts? No virtual memory?
Switch Client Server Switch Switch Switch Switch
Research Issues
May 26, 2013 RAMCloud Slide 19
Low latency RPC
- Network protocol stack:
TCP too slow (especially with packet loss) Must avoid copies
- Client side: need efficient path through VM:
User-level access to network interface?
- Preliminary experiments:
10 - 15 µs roundtrip Direct connection: no switches Biggest current problem: NICs (3-5 µs each way)
Research Issues
May 26, 2013 RAMCloud Slide 20
Durability and availability
- Data must be durable when write RPC returns
- Unattractive approaches:
Replicate in other memories (too expensive – money and energy ) Synchronous disk write (100 - 1000x too slow)
- One possibility: buffered logging
DRAM disk DRAM disk
Storage Servers write
log async, batch
DRAM disk
log
Research Issues
May 26, 2013 RAMCloud Slide 21
Durability and availability
- Potential problem: crash recovery
If master crashes, data unavailable until read from disks on backups Read 64 GB from one disk? 10 minutes Our goal: recover in 1 - 2 seconds
- Solution: take advantage of system scale
Shard backup data across many servers Recover in parallel Truncated disk logs
- System throughput will be limited by disk bandwidth for
writing: 50,000 updates/second (vs. 1M read/second )
Single disk per server with 100 MB/second throughput Typical updates of a few hundred bytes It make sense to use Flash memory instate of disk
Research Issues
May 26, 2013 RAMCloud Slide 22
Durability and availability
- Potential problem: power failures
- Solution: one of three ways
Servers continue operating long to flush data to disk (buffered logging – few seconds, replication – 10 minutes) Applications tolerate the loss of unwritten log data (buffered logging ) All data committed to stable storage
- Potential problem: high bit error rates for DRAM
- Solution: special attention
ECC memory is not enough Augment stored objects with check-sums Scan stored data periodically to detect latent errors.
Research Issues
- Data model aspects
What is the nature of the basic objects stored in the system? How are basic objects organized into higher-level structures? When retrieving or modifying basic objects? how are the objects named?
- Highly-structured data models
may not be practical to scale the relational structures over large numbers
- f servers
- Unstructured models
may not be rich enough to support a variety of applications conveniently
May 26, 2013 RAMCloud Slide 23
Data model
Research Issues
May 26, 2013 RAMCloud Slide 24
Data model
Identifier (64b) Version (64b) Blob (≤1MB) Object Richer model in the future:
- Indexes?
- Transactions?
- Graphs?
One-Size-Fits-All
Research Issues
- Advantage: data distributed automatically by RAMCloud:
Tables can be split across multiple servers Indexes can be split across multiple servers Distribution transparent to applications
- High throughput of a RAMCloud makes data replication
unnecessary
- Buffered logging approach makes data movement relatively
straightforward
May 26, 2013 RAMCloud Slide 25
Distribution and scaling
Research Issues
- The ACID properties scale poorly
Many Web applications do not need full ACID behavior
- Recent storage systems give up some of the ACID properties
Bigtable support only transactions involving a single row Dynamo does not guarantee immediate and consistent updates of replicas
- Benefit of RAMCloud’s low latency: higher level of consistency
Reduce the length of time each transaction stays in the system Reduce aborted transactions (for optimistic concurrency control) Reduce lock wait times (for pessimistic concurrency control)
- As a result, a low latency system can scale to much higher
- verall throughput before the cost of ACID becomes prohibitive
May 26, 2013 RAMCloud Slide 26
Concurrency, transactions, and consistency
Research Issues
- Multi-tenancy for cloud
computing:
Support multiple (potentially hostile) applications Cost proportional to application size
- RAMClouds need to provide
access control and security mechanisms
- RAMClouds need to provide
mechanisms for performance isolation
that one application does not degrade performance of other applications.
May 26, 2013 RAMCloud Slide 27
Multi-tenancy
Research Issues
- Move functionality to the client library?
Flexibility: enable different implementations Throughput: offload servers May improve performance (e.g., aggregation)
- Concentrate functionality in servers?
May improve performance (e.g., faster synchronization) Can't depend on proper client behavior:
- Security/access control
- Consistency/crash recovery
May 26, 2013 RAMCloud Slide 28
Server-client functional distribution
Application RAMCloud Library RAMCloud Server Application Custom Library
Research Issues
- RAMClouds must manage themselves automatically
Thousands of servers, each using hundreds of its peers Hundreds of applications with competing needs
- Building a new storage system from scratch
Opportunity to get this essential functionality done right Getting overall picture of system behavior by gathering significant amounts of data from different servers Gathering data must not significantly degrade the latency or throughput
May 26, 2013 RAMCloud Slide 29
Self-management
RAMCloud Disadvantages
- High cost per bit & High energy usage per bit
50 – 100x worst than disk-base systems 5 – 10x worst than system based on flash memory
- More floor space at Datacenters
- However, cost per operation & energy cost per operation
100 – 1000x more efficient than disk-based systems 5 – 10x more efficient than systems based on flash memory.
- More observations:
Many unanswered questions Write latency issues with inter-data center replication
May 26, 2013 RAMCloud Slide 30
Related Work
- In the mid-1980s there were numerous research experiments
with databases stored in main memory
Rio Vista project
- In recent years there has been a surge in the use of DRAM
Google and Yahoo! store their search indices entirely in DRAM Memcached Bigtable
- The limitations of disk storage
Jim Gray H-store project
May 26, 2013 RAMCloud Slide 31
Conclusions
- In the future a larger and larger fraction of online data be
kept in DRAM
- RAMCloud is proposed as the best long-term solution
(maybe…)
- Interesting combination of scale and latency
- An attractive substrate for cloud computing environments
- Enable more powerful uses of information at scale:
1000-10000 clients 100TB - 1PB 5-10 µs latency
May 26, 2013 RAMCloud Slide 32
RAMCloud Project
- Fall 2008: John Ousterhout suggests the basic idea for RAMCloud to Mendel
Rosenblum over lunch
- Spring 2009: a discussion group meets weekly to explore design issues for
- RAMCloud. The results are published as the white paper The Case for RAMCloud,
with all of the discussion group members as authors
- Fall 2009: the RAMCloud implementation team is formed and the group begins to
flesh out the design in detail
- Late Fall 2009: a trivial RAMCloud server responds to basic read and write and
requests
- April 1, 2010: all-day RAMCloud design review, which includes external reviewers
- November 8, 2010 : first successful recovery (TcpTransport, 2 backups, 2 masters, 1
segment, 1 partition)
- March, 2011: Nandu's measurements show that RAMCloud is achieving 5µs RPCs
for 100-byte reads; a single server can handle more than 1M RPCs/second
- Summer, 2011: major revisions of log cleaner; it's now almost production-ready
- February 18, 2012: First end-to-end recovery of a failed backup server
- March 2012: RAMCloud converts from fixed-size 64-bit keys to variable-length-byte-
array keys.
May 26, 2013 RAMCloud Slide 33
Acknowledgments
- Department of Computer Science
Stanford University:
John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen
- M. Rumble, Eric Stratmann, and Ryan
Stutsman
- The following people made
helpful comments on drafts of this paper:
David Andersen, Michael Armbrust, Jeff Dean, Robert John-son, Jim Larus, David Patterson, Jeff Rothschild, and Vijay Vasudevan.
May 26, 2013 RAMCloud Slide 34
References
- The case for RAMClouds: scalable high performance storage
entirely in DRAM – John Ousterhout et al. ACM SIGOPS Operating Systems Review – 2009
- https://ramcloud.stanford.edu/wiki/display/ramcloud/RA
MCloud+Presentations
- http://www.youtube.com/watch?v=lcUvU3b5co8
May 26, 2013 RAMCloud Slide 35
May 26, 2013 RAMCloud Slide 36