Building a Database on S3 Building a Database on S3 Matthias - - PowerPoint PPT Presentation

building a database on s3 building a database on s3
SMART_READER_LITE
LIVE PREVIEW

Building a Database on S3 Building a Database on S3 Matthias - - PowerPoint PPT Presentation

Building a Database on S3 Building a Database on S3 Matthias Brantner , Daniela Florescu + , David Graf , Donald Kossmann Tim Kraska Donald Kossmann , Tim Kraska Systems Group, ETH Zurich 28msec Inc. Oracle +


slide-1
SLIDE 1

Building a Database on S3 Building a Database on S3

Matthias Brantner , Daniela Florescu+, David Graf , Donald Kossmann Tim Kraska Donald Kossmann , Tim Kraska Systems Group, ETH Zurich 28msec Inc. Oracle +

September 25, 2007

slide-2
SLIDE 2

Motivation

  • Building a web page starting a blog and
  • Building a web page, starting a blog, and

making both searchable for the public have become a commodity

  • But providing your own service (and to

get rich) still comes at high cost:

  • Have the right (business) idea

Have the right (business) idea

  • Run your own web-server and database
  • Maintain the infrastructure

K th i 24 7

  • Keep the service up 24 x 7
  • Backup the data
  • Tune the system if the service is used

more often

A d th th Di Eff t

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

2

And then comes the Digg-Effect

slide-3
SLIDE 3

Scalability

Requirements for DM on the Web

Scalability

response time independent of number of clients

No administration No administration

„outsource“ patches, backups, fault tolerance

100 t d it il bilit

100 percent read + write availability

no client is ever blocked under any circumstances

Cost ($$$)

get cheaper every year, leverage new technology pay as you go along, no investment upfront

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

3

slide-4
SLIDE 4

Utility Computing as a solution

Scalability

  • Scalability

response time independent of number of clients

No administration

  • No administration

„outsource“ patches, backups, fault tolerance

100 t d it il bilit

  • 100 percent read + write availability

no client is ever blocked under any circumstances

Cost ($$$)

get cheaper every year, leverage new technology

Consistency: optimization goal,

pay as you go along, no investment upfront

?

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

y p g , not constraint

3

?

slide-5
SLIDE 5

Utility Computing as a solution

Scalability

  • Scalability

response time independent of number of clients

No administration

  • No administration

„outsource“ patches, backups, fault tolerance

100 t d it il bilit

  • 100 percent read + write availability

no client is ever blocked under any circumstances

Cost

How much does it cost?

Consistency

How much consistency is

Cost ($$$)

get cheaper every year, leverage new technology

  • uc

does t cost required by my application?

Consistency: optimization goal,

pay as you go along, no investment upfront

?

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

y p g , not constraint

3

?

slide-6
SLIDE 6

Amazon Web Services (AWS)

  • Most popular utility provider

Most popular utility provider

  • Gives us all necessary building blocks (Storage, CPU-cycles, etc.)
  • Other providers also appear on the market
  • Amazon infrastructure services:

Simple Storage Service (S3)

  • (Virtually) infinite store

Simple Queuing Service (SQS)

  • Message service
  • (Virtually) infinite store
  • Costs: $0.15 per GB-month + transfer costs

($0.1-$0.17 In/Out per GB)

  • Message service
  • Allows to exclusively receive a message
  • Costs: $0.0001 per message sent + transfer

costs

Elastic Cloud Computing (EC2)

Vi t l i t 1 8 i t l ( 1 0 2 5

SimpleDB

B i ll t t i d

  • Virtual instance: 1-8 virtual cores (=1.0-2.5

GHz Opterons), 1.7-15 GB of memory, 160GB-1690GB of instance storage

  • Costs: $0.1-$0.8 per hour + transfer costs
  • Basically a text-index
  • Costs: $0.14 per Amazon SimpleDB

machine hour consumed

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

4

slide-7
SLIDE 7

Plan of Attack

St

1 U S3 h h d di k

Step 1: Use S3 as a huge shared disk

leverage scalability, no admin features

Step 2: Allow concurrent access to shared disk

in a distributed system

keep properties of a distributed system,

maximize consistency

Step 3: Do application-specific trade-offs

consistency vs. cost consistency vs. availability consistency à la carte (levels of consistency)

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

5

slide-8
SLIDE 8

Plan of Attack

St

1 U S3 h h d di k

Step 1: Use S3 as a huge shared disk

leverage scalability, no admin features

Step 2: Allow concurrent access to shared disk

in a distributed system

keep properties of a distributed system,

maximize consistency

Step 3: Do application-specific trade-offs

consistency vs. cost consistency vs. availability consistency à la carte (levels of consistency)

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

6

slide-9
SLIDE 9

Shared-Disk Architecture

Cli t 1 / EC2

C

Application Client 1 / EC2 Application Client M / EC2

........

  • uld be

comp

Record Manager Record Manager

........ e exec pletely

Page Manager Page Manager

........ cuted o

  • n the

1 2 3 4 5 N

n EC2 client

6 Page 1 Page 2 Page 3 Page 4 Page 5 Page N

  • r

Page 6

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

7

S3

slide-10
SLIDE 10

Problem: Eventual Consistency

Two clients update

the same page

Client 1 / EC2 Client 2 / EC2 Application Application

the same page

Last update wins

C i t bl

Record Manager Page Manager Record Manager Page Manager

Consistency problem

Inconsistency between

i d d

Page Manager Page Manager

indexes and page

Lost records

Lost updates

age 1 age 2 age 3 age 4 age 5 age N age 6

Lost updates

Pa Pa Pa Pa Pa Pa

S3

Pa

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

8

S3

slide-11
SLIDE 11

Plan of Attack

St

1 U S3 h h d di k

Step 1: Use S3 as a huge shared disk

leverage scalability, no admin features

Step 2: Allow concurrent access to shared disk

in a distributed system

keep properties of a distributed system,

maximize consistency

Step 3: Do application-specific trade-offs

consistency vs. cost consistency vs. availability consistency à la carte (levels of consistency)

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

9

slide-12
SLIDE 12

Levels of Consistency [Tanenbaum]

Shared-Disk (Naïve approach)

Shared Disk (Naïve approach)

No concurrency control at all

Eventual Consistency (Basic Protocol) Eventual Consistency (Basic Protocol)

Updates become visible any time and will persist

No lost update on page level

No lost update on page level

Atomicity

All or no updates of a transaction become visible

Monotonic reads, Read your writes, Monotonic

writes, ...

Strong Consistency

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

g y

database-style consistency (ACID) via OCC

10

slide-13
SLIDE 13

Levels of Consistency [Tanenbaum]

Shared-Disk (Naïve approach)

Shared Disk (Naïve approach)

No concurrency control at all

Eventual Consistency (Basic Protocol) Eventual Consistency (Basic Protocol)

Updates become visible any time and will persist

No lost update on page level

No lost update on page level

Atomicity

All or no updates of a transaction become visible

Monotonic reads, Read your writes, Monotonic

writes, ...

Strong Consistency

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

g y

database-style consistency (ACID) via OCC

11

slide-14
SLIDE 14

Basic Protocol: Queues

O PU d L k i

Client 2 Client 1 Client M

  • One PU and Lock queue is

associated to each page

  • Lock queues contain exactly

l k l k l k l k l k

Lo Que

Lock queues contain exactly

  • ne message (inserted directly

after creating the queue)

lock lock lock lock lock

ck eues Pend Q

  • Commit to pages in two phases

ing Update Queues

age 1

e S3

age N age 2 age 3 age 4

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

12

Pa Pa Pa Pa Pa

slide-15
SLIDE 15

Basic Protocol

St 1 C it

Client 2 Client 1 Client M

Step 1: Commit Clients commit update d t PU Q

log log log log l log log l k l k l k l k l k

Lo Que

records to PU-Queues

log log log log lock lock lock lock lock

ck eues Pend Q ing Update Queues

age 1

e S3

age N age 2 age 3 age 4

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

12

Pa Pa Pa Pa Pa

slide-16
SLIDE 16

Basic Protocol

St 1 C it

Client 2 Client 1 Client M

Step 1: Commit Clients commit update d t PU Q

log log log log l log log l k l k l k l k l k

Lo Que

records to PU-Queues

log log log log lock lock lock lock lock

ck eues Pend Q ing Update Queues

age 1

e S3

age N age 2 age 3 age 4

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Pa Pa Pa Pa Pa

12

slide-17
SLIDE 17

Basic Protocol

St 1 C it

Client 2 Client 1 Client M

Step 1: Commit Clients commit update d t PU Q

l k l k l k l k l k

Lo Que

records to PU-Queues

  • Commit of the transaction

T i i fi i h d

lock lock lock lock lock

ck eues Pend Q

  • Transaction is finished

log log

ing Update Queues

log log log log log age 1

e S3

age N age 2 age 3 age 4

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Pa Pa Pa Pa Pa

12

slide-18
SLIDE 18

Basic Protocol

St 2 Ch k i ti

Client 2 Client 1 Client M

Step 2: Checkpointing

  • Checkpointing propagates

updates from SQS to S3

l k l k l k l k l k

Lo Que

updates from SQS to S3

  • Updates become visible on S3
  • Checkpointing requires

lock lock lock lock lock

ck eues Pend Q

Checkpointing requires synchronization

  • Achieved by using SQS as

log log

ing Update Queues

exclusive locks

log log log log log age 1

e S3

age N age 2 age 3 age 4

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Pa Pa Pa Pa Pa

12

slide-19
SLIDE 19

Basic Protocol

St 2 Ch k i ti

Client 2 Client 1 Client M

Step 2: Checkpointing

1.

Receive Lock

l k l k l k l k l k

Lo Que

lock lock lock lock lock

ck eues Pend Q

log log

ing Update Queues

log log log log log age 1

e S3

age N age 2 age 3 age 4

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Pa Pa Pa Pa Pa

12

slide-20
SLIDE 20

Basic Protocol

St 2 Ch k i ti

Client 2 Client 1 Client M

Step 2: Checkpointing

1.

Receive Lock

2

f

lock l k l k l k l k

Lo Que

2.

Refresh Page

lock lock lock lock

ck eues Pend Q

age 1 log log

ing Update Queues

log log log log log Pa

e S3

age N age 2 age 3 age 4 age 1

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Pa Pa Pa Pa

12

Pa

slide-21
SLIDE 21

Basic Protocol

St 2 Ch k i ti

Client 2 Client 1 Client M

Step 2: Checkpointing

1.

Receive Lock

2

f

lock Page 1 l k l k l k l k

Lo Que

2.

Refresh Page

3.

Receive Messages: as many as possible

log log lock lock lock lock

ck eues Pend Q

as possible

log log

ing Update Queues

log log log log log

e S3

age N age 2 age 3 age 4 age 1

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Pa Pa Pa Pa Pa

12

slide-22
SLIDE 22

Basic Protocol

St 2 Ch k i ti

Client 2 Client 1 Client M

Step 2: Checkpointing

1.

Receive Lock

2

f

lock Page 1 Page 1 l k l k l k l k

Lo Que

2.

Refresh Page

3.

Receive Messages: as many as possible

lock lock lock lock

ck eues Pend Q

as possible

4.

Apply the log records to the cached page

log log

ing Update Queues

log log log log log

e S3

age N age 2 age 3 age 4 age 1

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Pa Pa Pa Pa Pa

12

slide-23
SLIDE 23

Basic Protocol

St 2 Ch k i ti

Client 2 Client 1 Client M

Step 2: Checkpointing

1.

Receive Lock

2

f

lock l k l k l k l k

Lo Que

2.

Refresh Page

3.

Receive Messages: as many as possible

lock lock lock lock

ck eues Pend Q

as possible

4.

Apply the log records to the cached page

log log

ing Update Queues

5.

Put the new version of the page to S3

log log log log log

e S3

age N age 2 age 3 age 4 age 1 age 1

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Pa Pa Pa Pa Pa Pa

12

slide-24
SLIDE 24

Basic Protocol

St 2 Ch k i ti

Client 2 Client 1 Client M

Step 2: Checkpointing

1.

Receive Lock

2

f

lock l k l k l k l k

Lo Que

2.

Refresh Page

3.

Receive Messages: as many as possible

lock lock lock lock

ck eues Pend Q

as possible

4.

Apply the log records to the cached page

ing Update Queues

5.

Put the new version of the page to S3

6

D l t ll th l d hi h

log log log log log

e S3

age N age 2 age 3 age 4

6.

Delete all the log records which were received in Step 3

age 1 age 1

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Pa Pa Pa Pa Pa Pa

12

slide-25
SLIDE 25

Basic Protocol

Client 2 Client 1 Client M

  • Extremely simple
  • No additional infrastructure (except

SQS) is needed

l k l k l k l k

Lo Que

l k

SQS) is needed

  • Protocol is also resilient to failures
  • Applying a log record twice does not harm

lock lock lock lock

ck eues Pend Q

lock as log records are idempotent

  • Protocol has a price: dollar and

freshness of the data

ing Update Queues

log log log log log

freshness of the data

  • Still weak consistency/

transactional properties,

e S3

age N age 2 age 3 age 4 age 1 age 1

  • No atomicity
  • No monotonic guaranties: Ordering of the

log record messages is not important

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Pa Pa Pa Pa Pa Pa

13

  • No concurrency control on record level
slide-26
SLIDE 26

Atomicity: All or none of the updates of a

transaction become visible

Client

Each client manages an atomic queue. atomic queue.

log rec.

PU Queue

log rec.

PU Queue Atomic Queue

S3

log rec.

LOCK Queue LOCK Queue

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

14

Queue

lock

Queue

lock

slide-27
SLIDE 27

Atomicity: All or none of the updates of a

transaction become visible

Client

Commit Protocol

1

Send all log records to

  • 1. Send all log records to

the ATOMIC queue.

log rec.

PU Queue

log rec.

PU Queue

log rec.

Atomic Queue

S3

log rec. log rec. log rec

LOCK Queue LOCK Queue

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

14

Queue

lock

Queue

lock

slide-28
SLIDE 28

Atomicity: All or none of the updates of a

transaction become visible

Client

Commit Protocol

1

Send all log records to

  • 1. Send all log records to

the ATOMIC queue.

2

Send commit log record

log rec.

PU Queue

  • 2. Send commit log record.

log rec.

PU Queue

log rec.

Atomic Queue

S3

log rec. log rec. log rec commit

LOCK Queue LOCK Queue

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

14

Queue

lock

Queue

lock

slide-29
SLIDE 29

Atomicity: All or none of the updates of a

transaction become visible

Client

Commit Protocol

1

Send all log records to

  • 1. Send all log records to

the ATOMIC queue.

2

Send commit log record

log rec.

PU Queue

  • 2. Send commit log record.
  • 3. Send all log records to

the corresponding PU

log rec.

PU Queue Atomic Queue

log rec.

S3

the corresponding PU queues.

log rec. log.rec log rec. log rec log rec. log rec commit

LOCK Queue LOCK Queue

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

14

Queue

lock

Queue

lock

slide-30
SLIDE 30

Atomicity: All or none of the updates of a

transaction become visible

Client

Commit Protocol

1

Send all log records to

  • 1. Send all log records to

the ATOMIC queue.

2

Send commit log record

log rec.

PU Queue

  • 2. Send commit log record.
  • 3. Send all log records to

the corresponding PU

log rec.

PU Queue Atomic Queue

S3

the corresponding PU queues.

4

Delete all message after

log rec. log.rec log rec. log rec

LOCK Queue

  • 4. Delete all message after

committing.

LOCK Queue

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

14

Queue

lock

Queue

lock

slide-31
SLIDE 31

Atomicity cont’d.

Wh li t f il th li t h k it ATOMIC

When a client fails, the client checks its ATOMIC

queue at restart

Winners are all log records which carry the same

id as one of the commit records found in the id as one of the commit records found in the ATOMIC queue; all other log records are losers

Winners are propagated, others deleted

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

15

slide-32
SLIDE 32

Plan of Attack

St

1 U S3 h h d di k

Step 1: Use S3 as a huge shared disk

leverage scalability, no admin features

Step 2: Allow concurrent access to shared disk

in a distributed system

keep properties of a distributed system,

maximize consistency

Step 3: Do application-specific trade-offs

consistency vs. cost consistency vs. availability consistency à la carte (levels of consistency)

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

16

slide-33
SLIDE 33

Experiments and Results

G

l St d i th t d ff i t f

Goal: Studying the trade-offs in terms of

consistency, latency and cost ($)

We used a sub-set of the TPC-W benchmark

(models a bookstore scenario)

All experiments were done with a complex

customer transaction involving the following g g steps:

a) retrieve the customer record from the database; ) b) search for six specific products; c)

place orders for three of the six products.

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

17

slide-34
SLIDE 34

Running Time per Transaction [secs]

Avg Max Avg. Max. Naïve (Shared-Disk) 11.3 12.1 Basic (Eventual Consistency) 4.0 5.9 as c ( e tua Co s ste cy) 5 9 Monotonicity 4.0 6.8 Atomicity + Monotonicity 2.8 4.6

Doesn’t include checkpointing (asynchronously) Every transaction simulates around 12 clicks Every transaction simulates around 12 clicks Time is less than a sec. per click Time is independent of the number of clients

It's fast. Not 15K-SCSI-RAID0-fast, but internet-

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

18

It s fast. Not 15K SCSI RAID0 fast, but internet latency-fast.

slide-35
SLIDE 35

Cost per 1000 Transactions ($)

Step1: Step2: Checkpoint Total Step1: Commit Step2: Checkpoint + Atomic Queue Total Naïve (Shared-Disk) 0.15 0.15 Basic (Eventual Consistency) 0.7 1.1 1.8 Monotonicity 0.7 1.4 2.1 At i it M t i it 0 3 2 6 2 9 Atomicity + Monotonicity 0.3 2.6 2.9

Interaction with SQS becomes expensive For a bookstore, a transactional cost of about 3

milli-dollars (i.e., 0.3 cents)

Especially updates have a big influence on the

cost

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

19

Not cheap, but in many scenarios affordable

slide-36
SLIDE 36

Summary and Future Work

A

hit t ll t t li

Architecture allows transparent scaling:

No need to change code, hardware,...

Consistency is a goal, not a constraint:

Consistency à la carte for your applications

Amazon’s WebServices are a viable candidate

for many Web 2.0 and interactive applications y pp

Future work:

SimpleDB as the main index SimpleDB as the main index EC2 as application server Further studies of stronger consistency protocols

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

Further studies of stronger consistency protocols

20

slide-37
SLIDE 37

Thank you for your interest Thank you for your interest Questions?

Contact: tim.kraska@inf.ethz.ch

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

21

slide-38
SLIDE 38

Cost per 1000 Transacts., Various Checkpoint Intervals p

June 18, 2008

Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch

38