The Case for RAMClouds: Scalable High-Performance Storage Entirely - - PowerPoint PPT Presentation

the case for ramclouds scalable high performance storage
SMART_READER_LITE
LIVE PREVIEW

The Case for RAMClouds: Scalable High-Performance Storage Entirely - - PowerPoint PPT Presentation

The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM Harel Cohen Tel Aviv University Outline Introduction RAMCloud Overview Motivation Research Issues RAMCloud Disadvantages Related Work


slide-1
SLIDE 1

Harel Cohen Tel Aviv University

The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM

slide-2
SLIDE 2

Outline

  • Introduction
  • RAMCloud Overview
  • Motivation
  • Research Issues
  • RAMCloud Disadvantages
  • Related Work
  • Conclusion
  • RAMCloud Project
  • Acknowledgments
  • References

May 26, 2013 RAMCloud Slide 2

slide-3
SLIDE 3

Introduction

May 26, 2013 RAMCloud Slide 3

  • DRAM usage limited/specialized
  • Clumsy (consistency with

backing store)

  • Lost performance (cache

misses, backing store)

1970 1980 1990 2000 2010

UNIX buffer cache Main-memory databases Large file caches Web indexes entirely in DRAM memcached Facebook: 200 TB total data 150 TB cache! Main-memory DBs, again

slide-4
SLIDE 4

Introduction

May 26, 2013 RAMCloud Slide 4

RAMCloud – harness full performance potential of large-scale DRAM storage:

  • General-purpose storage system
  • All data always in DRAM (no cache misses)
  • Durable and available (no backing store)
  • Scale: 1000+ servers, 100+ TB
  • Low latency: 5-10µs remote access (100 - 1000x lower than disk)
  • High throughput (100 - 1000x higher than disk)

Potential impact: enable new class of applications Large-scale Web applications Richer query models that enable a new class of data-intensive Scalable storage substrate

slide-5
SLIDE 5

RAMCloud Overview

May 26, 2013 RAMCloud Slide 5

  • Storage for datacenters
  • 1000-10000 commodity servers
  • 32-64 GB DRAM/server
  • All data always in RAM
  • Scale automatically
  • Durable and available
  • 100 – 1000x better disk-based

storage systems

  • Performance goals:

High throughput: 1M ops/sec/server Low-latency access: 5-10µs Application Servers Storage Servers Datacenter

slide-6
SLIDE 6

RAMCloud Overview

May 26, 2013 RAMCloud Slide 6

2009 5-10 years # servers 1000 4000 GB/server 64GB 256GB Total capacity 64TB 1PB Total server cost $4M $5M $/GB $60 $5 For $100-200K today:

One year of Amazon customer orders One year of United flight reservations

slide-7
SLIDE 7

RAMCloud Overview

May 26, 2013 RAMCloud Slide 7

Master Backup Master Backup Master Backup Master Backup

Appl. Library Appl. Library Appl. Library Appl. Library

Datacenter Network

Coordinator

1000 – 10,000 Storage Servers 1000 – 100,000 Application Servers

slide-8
SLIDE 8

Motivation

  • Relational databases don’t scale
  • Every large-scale Web application has problems:

Facebook: 4000 MySQL instances + 2000 memcached servers

  • Major system redesign for every 10x increase in scale
  • New forms of storage appearing:

Bigtable Dynamo PNUTS

  • RAMCloud may provide a general-purpose storage system that

scales far beyond existing systems

May 26, 2013 RAMCloud Slide 8

Application scalability

slide-9
SLIDE 9

Motivation

Disk access rate not keeping up with capacity:

May 26, 2013 RAMCloud Slide 9

Technology trends

Mid-1980’s 2009 Change Disk capacity 30 MB 500 GB 16667x

  • Max. transfer rate

2 MB/s 100 MB/s 50x Latency (seek & rotate) 20 ms 10 ms 2x Capacity/bandwidth (large blocks) 15 s 5000 s 333x Capacity/bandwidth (1KB blocks) 600 s 58 days 8333x Jim Gray's rule 5 min 30 hours 360x

slide-10
SLIDE 10

Motivation

May 26, 2013 RAMCloud Slide 10

Technology trends

  • It is not possible to access information on disk very frequently
  • Disks must become more archival
  • More information must move to memory
  • Jim Gary’s Rule
slide-11
SLIDE 11

Motivation

  • Lost performance:

1% misses → 10x performance degradation Hard to approach 1% misses (Example: Facebook ~ 5-7% misses)

  • Won’t save much money:

Already have to keep information in memory Example: Facebook caches ~75% of data size

  • Changes disk management issues:

Optimize for reads, vs. writes & recovery

May 26, 2013 RAMCloud Slide 11

Caching

slide-12
SLIDE 12

Motivation

May 26, 2013 RAMCloud Slide 12

Does latency matter?

  • Large-scale apps struggle with high latency

Facebook: make average of 130 internal requests per page Amazon: 100 – 200 internal requests per page UI App. Logic

Data Structures

Traditional Application

<< 1µs latency

Single machine

UI Bus. Logic

Application Servers Storage Servers

Web Application

0.5 - 10ms latency

Datacenter

slide-13
SLIDE 13

Motivation

May 26, 2013 RAMCloud Slide 13

Does latency matter?

  • RAMCloud goal: large scale and low latency
  • New “one size fits all” database
  • Enable a new breed of information-intensive applications

UI App. Logic

Data Structures

Traditional Application

<< 1µs latency

Single machine

UI Bus. Logic

Application Servers Storage Servers

Web Application

0.5 - 10ms latency

Datacenter

5 - 10µs

slide-14
SLIDE 14

Motivation

May 26, 2013 RAMCloud Slide 14

Use flash memory instead of DRAM?

  • Many candidate technologies besides DRAM

Flash Memory (NAND, NOR) PCRAM (Phase-Change Memory)

  • FlashCloud would be cheaper and consume

less energy than a DRAM-based approach

  • DRAM enables lowest latency today:

5 - 10x faster than flash memory

  • RAMCloud provide higher throughput than FlashCloud
  • Most RAMCloud techniques will apply to other technologies
slide-15
SLIDE 15

Motivation

May 26, 2013 RAMCloud Slide 15

Use flash memory instead of DRAM?

slide-16
SLIDE 16

Motivation

  • Facebook: 200 TB of (non-image) data in 2009
  • Online Retailer – (Example : Amazon):

Revenues/year: ~ $16B Orders/year: ~ 400M ($40/order) Bytes/order: ~ 1000 - 10000 Order data/year: ~ 400 GB - 4.0 TB RAMCloud cost: ~ $24 - 240K

  • Airline Reservations – (Example: United Airlines):

Total flights/day: ~ 4000 (30,000 for all airlines in U.S.) Passenger flights/year: ~ 200M Bytes/passenger-flight: ~ 1000 - 10000 Order data/year: ~ 220 GB - 2.2 TB RAMCloud cost: ~ $13 - 130K

Ready today for almost all online data; media soon…

May 26, 2013 RAMCloud Slide 16

RAMCloud applicability today

slide-17
SLIDE 17

Research Issues

May 26, 2013 RAMCloud Slide 17

Some challenging research issues:

  • Low latency RPC
  • Durability and availability
  • Data model
  • Distribution and scaling
  • Concurrency, transactions, and consistency
  • Multi-tenancy
  • Server-client functional distribution
  • Self-management
slide-18
SLIDE 18

Research Issues

May 26, 2013 RAMCloud Slide 18

Low latency RPC

Achieving 5 - 10µs will impact every layer of the system:

  • Must reduce network latency:

Typical today: 300-500 µs roundtrip (10-30 µs/switch, 5 switches each way) Arista 7100S 10GB switch : 9 µs roundtrip (0.9 µs/switch)

Need cut-through routing

  • Tailor OS on server side:

Dedicated cores No interrupts? No virtual memory?

Switch Client Server Switch Switch Switch Switch

slide-19
SLIDE 19

Research Issues

May 26, 2013 RAMCloud Slide 19

Low latency RPC

  • Network protocol stack:

TCP too slow (especially with packet loss) Must avoid copies

  • Client side: need efficient path through VM:

User-level access to network interface?

  • Preliminary experiments:

10 - 15 µs roundtrip Direct connection: no switches Biggest current problem: NICs (3-5 µs each way)

slide-20
SLIDE 20

Research Issues

May 26, 2013 RAMCloud Slide 20

Durability and availability

  • Data must be durable when write RPC returns
  • Unattractive approaches:

Replicate in other memories (too expensive – money and energy ) Synchronous disk write (100 - 1000x too slow)

  • One possibility: buffered logging

DRAM disk DRAM disk

Storage Servers write

log async, batch

DRAM disk

log

slide-21
SLIDE 21

Research Issues

May 26, 2013 RAMCloud Slide 21

Durability and availability

  • Potential problem: crash recovery

If master crashes, data unavailable until read from disks on backups Read 64 GB from one disk? 10 minutes Our goal: recover in 1 - 2 seconds

  • Solution: take advantage of system scale

Shard backup data across many servers Recover in parallel Truncated disk logs

  • System throughput will be limited by disk bandwidth for

writing: 50,000 updates/second (vs. 1M read/second )

Single disk per server with 100 MB/second throughput Typical updates of a few hundred bytes It make sense to use Flash memory instate of disk

slide-22
SLIDE 22

Research Issues

May 26, 2013 RAMCloud Slide 22

Durability and availability

  • Potential problem: power failures
  • Solution: one of three ways

Servers continue operating long to flush data to disk (buffered logging – few seconds, replication – 10 minutes) Applications tolerate the loss of unwritten log data (buffered logging ) All data committed to stable storage

  • Potential problem: high bit error rates for DRAM
  • Solution: special attention

ECC memory is not enough Augment stored objects with check-sums Scan stored data periodically to detect latent errors.

slide-23
SLIDE 23

Research Issues

  • Data model aspects

What is the nature of the basic objects stored in the system? How are basic objects organized into higher-level structures? When retrieving or modifying basic objects? how are the objects named?

  • Highly-structured data models

may not be practical to scale the relational structures over large numbers

  • f servers
  • Unstructured models

may not be rich enough to support a variety of applications conveniently

May 26, 2013 RAMCloud Slide 23

Data model

slide-24
SLIDE 24

Research Issues

May 26, 2013 RAMCloud Slide 24

Data model

Identifier (64b) Version (64b) Blob (≤1MB) Object Richer model in the future:

  • Indexes?
  • Transactions?
  • Graphs?

One-Size-Fits-All

slide-25
SLIDE 25

Research Issues

  • Advantage: data distributed automatically by RAMCloud:

Tables can be split across multiple servers Indexes can be split across multiple servers Distribution transparent to applications

  • High throughput of a RAMCloud makes data replication

unnecessary

  • Buffered logging approach makes data movement relatively

straightforward

May 26, 2013 RAMCloud Slide 25

Distribution and scaling

slide-26
SLIDE 26

Research Issues

  • The ACID properties scale poorly

Many Web applications do not need full ACID behavior

  • Recent storage systems give up some of the ACID properties

Bigtable support only transactions involving a single row Dynamo does not guarantee immediate and consistent updates of replicas

  • Benefit of RAMCloud’s low latency: higher level of consistency

Reduce the length of time each transaction stays in the system Reduce aborted transactions (for optimistic concurrency control) Reduce lock wait times (for pessimistic concurrency control)

  • As a result, a low latency system can scale to much higher
  • verall throughput before the cost of ACID becomes prohibitive

May 26, 2013 RAMCloud Slide 26

Concurrency, transactions, and consistency

slide-27
SLIDE 27

Research Issues

  • Multi-tenancy for cloud

computing:

Support multiple (potentially hostile) applications Cost proportional to application size

  • RAMClouds need to provide

access control and security mechanisms

  • RAMClouds need to provide

mechanisms for performance isolation

that one application does not degrade performance of other applications.

May 26, 2013 RAMCloud Slide 27

Multi-tenancy

slide-28
SLIDE 28

Research Issues

  • Move functionality to the client library?

Flexibility: enable different implementations Throughput: offload servers May improve performance (e.g., aggregation)

  • Concentrate functionality in servers?

May improve performance (e.g., faster synchronization) Can't depend on proper client behavior:

  • Security/access control
  • Consistency/crash recovery

May 26, 2013 RAMCloud Slide 28

Server-client functional distribution

Application RAMCloud Library RAMCloud Server Application Custom Library

slide-29
SLIDE 29

Research Issues

  • RAMClouds must manage themselves automatically

Thousands of servers, each using hundreds of its peers Hundreds of applications with competing needs

  • Building a new storage system from scratch

Opportunity to get this essential functionality done right Getting overall picture of system behavior by gathering significant amounts of data from different servers Gathering data must not significantly degrade the latency or throughput

May 26, 2013 RAMCloud Slide 29

Self-management

slide-30
SLIDE 30

RAMCloud Disadvantages

  • High cost per bit & High energy usage per bit

50 – 100x worst than disk-base systems 5 – 10x worst than system based on flash memory

  • More floor space at Datacenters
  • However, cost per operation & energy cost per operation

100 – 1000x more efficient than disk-based systems 5 – 10x more efficient than systems based on flash memory.

  • More observations:

Many unanswered questions Write latency issues with inter-data center replication

May 26, 2013 RAMCloud Slide 30

slide-31
SLIDE 31

Related Work

  • In the mid-1980s there were numerous research experiments

with databases stored in main memory

Rio Vista project

  • In recent years there has been a surge in the use of DRAM

Google and Yahoo! store their search indices entirely in DRAM Memcached Bigtable

  • The limitations of disk storage

Jim Gray H-store project

May 26, 2013 RAMCloud Slide 31

slide-32
SLIDE 32

Conclusions

  • In the future a larger and larger fraction of online data be

kept in DRAM

  • RAMCloud is proposed as the best long-term solution

(maybe…)

  • Interesting combination of scale and latency
  • An attractive substrate for cloud computing environments
  • Enable more powerful uses of information at scale:

1000-10000 clients 100TB - 1PB 5-10 µs latency

May 26, 2013 RAMCloud Slide 32

slide-33
SLIDE 33

RAMCloud Project

  • Fall 2008: John Ousterhout suggests the basic idea for RAMCloud to Mendel

Rosenblum over lunch

  • Spring 2009: a discussion group meets weekly to explore design issues for
  • RAMCloud. The results are published as the white paper The Case for RAMCloud,

with all of the discussion group members as authors

  • Fall 2009: the RAMCloud implementation team is formed and the group begins to

flesh out the design in detail

  • Late Fall 2009: a trivial RAMCloud server responds to basic read and write and

requests

  • April 1, 2010: all-day RAMCloud design review, which includes external reviewers
  • November 8, 2010 : first successful recovery (TcpTransport, 2 backups, 2 masters, 1

segment, 1 partition)

  • March, 2011: Nandu's measurements show that RAMCloud is achieving 5µs RPCs

for 100-byte reads; a single server can handle more than 1M RPCs/second

  • Summer, 2011: major revisions of log cleaner; it's now almost production-ready
  • February 18, 2012: First end-to-end recovery of a failed backup server
  • March 2012: RAMCloud converts from fixed-size 64-bit keys to variable-length-byte-

array keys.

May 26, 2013 RAMCloud Slide 33

slide-34
SLIDE 34

Acknowledgments

  • Department of Computer Science

Stanford University:

John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen

  • M. Rumble, Eric Stratmann, and Ryan

Stutsman

  • The following people made

helpful comments on drafts of this paper:

David Andersen, Michael Armbrust, Jeff Dean, Robert John-son, Jim Larus, David Patterson, Jeff Rothschild, and Vijay Vasudevan.

May 26, 2013 RAMCloud Slide 34

slide-35
SLIDE 35

References

  • The case for RAMClouds: scalable high performance storage

entirely in DRAM – John Ousterhout et al. ACM SIGOPS Operating Systems Review – 2009

  • https://ramcloud.stanford.edu/wiki/display/ramcloud/RA

MCloud+Presentations

  • http://www.youtube.com/watch?v=lcUvU3b5co8

May 26, 2013 RAMCloud Slide 35

slide-36
SLIDE 36

May 26, 2013 RAMCloud Slide 36

Questions/Comments

Thank you