Building a Database on S3 Building a Database on S3
Matthias Brantner , Daniela Florescu+, David Graf , Donald Kossmann Tim Kraska Donald Kossmann , Tim Kraska Systems Group, ETH Zurich 28msec Inc. Oracle +
September 25, 2007
Building a Database on S3 Building a Database on S3 Matthias - - PowerPoint PPT Presentation
Building a Database on S3 Building a Database on S3 Matthias Brantner , Daniela Florescu + , David Graf , Donald Kossmann Tim Kraska Donald Kossmann , Tim Kraska Systems Group, ETH Zurich 28msec Inc. Oracle +
Building a Database on S3 Building a Database on S3
Matthias Brantner , Daniela Florescu+, David Graf , Donald Kossmann Tim Kraska Donald Kossmann , Tim Kraska Systems Group, ETH Zurich 28msec Inc. Oracle +
September 25, 2007
Motivation
making both searchable for the public have become a commodity
get rich) still comes at high cost:
Have the right (business) idea
K th i 24 7
more often
A d th th Di Eff t
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
2
And then comes the Digg-Effect
Scalability
Requirements for DM on the Web
Scalability
response time independent of number of clients
No administration No administration
„outsource“ patches, backups, fault tolerance
100 t d it il bilit
100 percent read + write availability
no client is ever blocked under any circumstances
Cost ($$$)
get cheaper every year, leverage new technology pay as you go along, no investment upfront
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
3
Utility Computing as a solution
Scalability
response time independent of number of clients
No administration
„outsource“ patches, backups, fault tolerance
100 t d it il bilit
no client is ever blocked under any circumstances
Cost ($$$)
get cheaper every year, leverage new technology
Consistency: optimization goal,
pay as you go along, no investment upfront
Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
y p g , not constraint
3
Utility Computing as a solution
Scalability
response time independent of number of clients
No administration
„outsource“ patches, backups, fault tolerance
100 t d it il bilit
no client is ever blocked under any circumstances
How much does it cost?
How much consistency is
Cost ($$$)
get cheaper every year, leverage new technology
does t cost required by my application?
Consistency: optimization goal,
pay as you go along, no investment upfront
Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
y p g , not constraint
3
Amazon Web Services (AWS)
Most popular utility provider
Simple Storage Service (S3)
Simple Queuing Service (SQS)
($0.1-$0.17 In/Out per GB)
costs
Elastic Cloud Computing (EC2)
Vi t l i t 1 8 i t l ( 1 0 2 5
SimpleDB
B i ll t t i d
GHz Opterons), 1.7-15 GB of memory, 160GB-1690GB of instance storage
machine hour consumed
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
4
Plan of Attack
St
1 U S3 h h d di k
Step 1: Use S3 as a huge shared disk
leverage scalability, no admin features
Step 2: Allow concurrent access to shared disk
in a distributed system
keep properties of a distributed system,
maximize consistency
Step 3: Do application-specific trade-offs
consistency vs. cost consistency vs. availability consistency à la carte (levels of consistency)
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
5
Plan of Attack
St
1 U S3 h h d di k
Step 1: Use S3 as a huge shared disk
leverage scalability, no admin features
Step 2: Allow concurrent access to shared disk
in a distributed system
keep properties of a distributed system,
maximize consistency
Step 3: Do application-specific trade-offs
consistency vs. cost consistency vs. availability consistency à la carte (levels of consistency)
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
6
Shared-Disk Architecture
Cli t 1 / EC2
C
Application Client 1 / EC2 Application Client M / EC2
........
comp
Record Manager Record Manager
........ e exec pletely
Page Manager Page Manager
........ cuted o
1 2 3 4 5 N
n EC2 client
6 Page 1 Page 2 Page 3 Page 4 Page 5 Page N
Page 6
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
7
S3
Problem: Eventual Consistency
Two clients update
the same page
Client 1 / EC2 Client 2 / EC2 Application Application
the same page
Last update wins
C i t bl
Record Manager Page Manager Record Manager Page Manager
Consistency problem
Inconsistency between
i d d
Page Manager Page Manager
indexes and page
Lost records
Lost updates
age 1 age 2 age 3 age 4 age 5 age N age 6
Lost updates
Pa Pa Pa Pa Pa Pa
S3
Pa
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
8
S3
Plan of Attack
St
1 U S3 h h d di k
Step 1: Use S3 as a huge shared disk
leverage scalability, no admin features
Step 2: Allow concurrent access to shared disk
in a distributed system
keep properties of a distributed system,
maximize consistency
Step 3: Do application-specific trade-offs
consistency vs. cost consistency vs. availability consistency à la carte (levels of consistency)
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
9
Levels of Consistency [Tanenbaum]
Shared-Disk (Naïve approach)
Shared Disk (Naïve approach)
No concurrency control at all
Eventual Consistency (Basic Protocol) Eventual Consistency (Basic Protocol)
Updates become visible any time and will persist
No lost update on page level
No lost update on page level
Atomicity
All or no updates of a transaction become visible
Monotonic reads, Read your writes, Monotonic
writes, ...
Strong Consistency
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
g y
database-style consistency (ACID) via OCC
10
Levels of Consistency [Tanenbaum]
Shared-Disk (Naïve approach)
Shared Disk (Naïve approach)
No concurrency control at all
Eventual Consistency (Basic Protocol) Eventual Consistency (Basic Protocol)
Updates become visible any time and will persist
No lost update on page level
No lost update on page level
Atomicity
All or no updates of a transaction become visible
Monotonic reads, Read your writes, Monotonic
writes, ...
Strong Consistency
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
g y
database-style consistency (ACID) via OCC
11
Basic Protocol: Queues
O PU d L k i
Client 2 Client 1 Client M
associated to each page
l k l k l k l k l k
Lo Que
Lock queues contain exactly
after creating the queue)
lock lock lock lock lock
ck eues Pend Q
ing Update Queues
age 1
e S3
age N age 2 age 3 age 4
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
12
Pa Pa Pa Pa Pa
Basic Protocol
St 1 C it
Client 2 Client 1 Client M
Step 1: Commit Clients commit update d t PU Q
log log log log l log log l k l k l k l k l k
Lo Que
records to PU-Queues
log log log log lock lock lock lock lock
ck eues Pend Q ing Update Queues
age 1
e S3
age N age 2 age 3 age 4
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
12
Pa Pa Pa Pa Pa
Basic Protocol
St 1 C it
Client 2 Client 1 Client M
Step 1: Commit Clients commit update d t PU Q
log log log log l log log l k l k l k l k l k
Lo Que
records to PU-Queues
log log log log lock lock lock lock lock
ck eues Pend Q ing Update Queues
age 1
e S3
age N age 2 age 3 age 4
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Pa Pa Pa Pa Pa
12
Basic Protocol
St 1 C it
Client 2 Client 1 Client M
Step 1: Commit Clients commit update d t PU Q
l k l k l k l k l k
Lo Que
records to PU-Queues
T i i fi i h d
lock lock lock lock lock
ck eues Pend Q
log log
ing Update Queues
log log log log log age 1
e S3
age N age 2 age 3 age 4
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Pa Pa Pa Pa Pa
12
Basic Protocol
St 2 Ch k i ti
Client 2 Client 1 Client M
Step 2: Checkpointing
updates from SQS to S3
l k l k l k l k l k
Lo Que
updates from SQS to S3
lock lock lock lock lock
ck eues Pend Q
Checkpointing requires synchronization
log log
ing Update Queues
exclusive locks
log log log log log age 1
e S3
age N age 2 age 3 age 4
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Pa Pa Pa Pa Pa
12
Basic Protocol
St 2 Ch k i ti
Client 2 Client 1 Client M
Step 2: Checkpointing
1.
Receive Lock
l k l k l k l k l k
Lo Que
lock lock lock lock lock
ck eues Pend Q
log log
ing Update Queues
log log log log log age 1
e S3
age N age 2 age 3 age 4
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Pa Pa Pa Pa Pa
12
Basic Protocol
St 2 Ch k i ti
Client 2 Client 1 Client M
Step 2: Checkpointing
1.
Receive Lock
2
f
lock l k l k l k l k
Lo Que
2.
Refresh Page
lock lock lock lock
ck eues Pend Q
age 1 log log
ing Update Queues
log log log log log Pa
e S3
age N age 2 age 3 age 4 age 1
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Pa Pa Pa Pa
12
Pa
Basic Protocol
St 2 Ch k i ti
Client 2 Client 1 Client M
Step 2: Checkpointing
1.
Receive Lock
2
f
lock Page 1 l k l k l k l k
Lo Que
2.
Refresh Page
3.
Receive Messages: as many as possible
log log lock lock lock lock
ck eues Pend Q
as possible
log log
ing Update Queues
log log log log log
e S3
age N age 2 age 3 age 4 age 1
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Pa Pa Pa Pa Pa
12
Basic Protocol
St 2 Ch k i ti
Client 2 Client 1 Client M
Step 2: Checkpointing
1.
Receive Lock
2
f
lock Page 1 Page 1 l k l k l k l k
Lo Que
2.
Refresh Page
3.
Receive Messages: as many as possible
lock lock lock lock
ck eues Pend Q
as possible
4.
Apply the log records to the cached page
log log
ing Update Queues
log log log log log
e S3
age N age 2 age 3 age 4 age 1
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Pa Pa Pa Pa Pa
12
Basic Protocol
St 2 Ch k i ti
Client 2 Client 1 Client M
Step 2: Checkpointing
1.
Receive Lock
2
f
lock l k l k l k l k
Lo Que
2.
Refresh Page
3.
Receive Messages: as many as possible
lock lock lock lock
ck eues Pend Q
as possible
4.
Apply the log records to the cached page
log log
ing Update Queues
5.
Put the new version of the page to S3
log log log log log
e S3
age N age 2 age 3 age 4 age 1 age 1
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Pa Pa Pa Pa Pa Pa
12
Basic Protocol
St 2 Ch k i ti
Client 2 Client 1 Client M
Step 2: Checkpointing
1.
Receive Lock
2
f
lock l k l k l k l k
Lo Que
2.
Refresh Page
3.
Receive Messages: as many as possible
lock lock lock lock
ck eues Pend Q
as possible
4.
Apply the log records to the cached page
ing Update Queues
5.
Put the new version of the page to S3
6
D l t ll th l d hi h
log log log log log
e S3
age N age 2 age 3 age 4
6.
Delete all the log records which were received in Step 3
age 1 age 1
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Pa Pa Pa Pa Pa Pa
12
Basic Protocol
Client 2 Client 1 Client M
SQS) is needed
l k l k l k l k
Lo Que
l k
SQS) is needed
lock lock lock lock
ck eues Pend Q
lock as log records are idempotent
freshness of the data
ing Update Queues
log log log log log
freshness of the data
transactional properties,
e S3
age N age 2 age 3 age 4 age 1 age 1
log record messages is not important
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Pa Pa Pa Pa Pa Pa
13
Atomicity: All or none of the updates of a
transaction become visible
Client
Each client manages an atomic queue. atomic queue.
log rec.
PU Queue
log rec.
PU Queue Atomic Queue
S3
log rec.
LOCK Queue LOCK Queue
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
14
Queue
lock
Queue
lock
Atomicity: All or none of the updates of a
transaction become visible
Client
Commit Protocol
1
Send all log records to
the ATOMIC queue.
log rec.
PU Queue
log rec.
PU Queue
log rec.
Atomic Queue
S3
log rec. log rec. log rec
LOCK Queue LOCK Queue
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
14
Queue
lock
Queue
lock
Atomicity: All or none of the updates of a
transaction become visible
Client
Commit Protocol
1
Send all log records to
the ATOMIC queue.
2
Send commit log record
log rec.
PU Queue
log rec.
PU Queue
log rec.
Atomic Queue
S3
log rec. log rec. log rec commit
LOCK Queue LOCK Queue
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
14
Queue
lock
Queue
lock
Atomicity: All or none of the updates of a
transaction become visible
Client
Commit Protocol
1
Send all log records to
the ATOMIC queue.
2
Send commit log record
log rec.
PU Queue
the corresponding PU
log rec.
PU Queue Atomic Queue
log rec.
S3
the corresponding PU queues.
log rec. log.rec log rec. log rec log rec. log rec commit
LOCK Queue LOCK Queue
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
14
Queue
lock
Queue
lock
Atomicity: All or none of the updates of a
transaction become visible
Client
Commit Protocol
1
Send all log records to
the ATOMIC queue.
2
Send commit log record
log rec.
PU Queue
the corresponding PU
log rec.
PU Queue Atomic Queue
S3
the corresponding PU queues.
4
Delete all message after
log rec. log.rec log rec. log rec
LOCK Queue
committing.
LOCK Queue
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
14
Queue
lock
Queue
lock
Atomicity cont’d.
Wh li t f il th li t h k it ATOMIC
When a client fails, the client checks its ATOMIC
queue at restart
Winners are all log records which carry the same
id as one of the commit records found in the id as one of the commit records found in the ATOMIC queue; all other log records are losers
Winners are propagated, others deleted
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
15
Plan of Attack
St
1 U S3 h h d di k
Step 1: Use S3 as a huge shared disk
leverage scalability, no admin features
Step 2: Allow concurrent access to shared disk
in a distributed system
keep properties of a distributed system,
maximize consistency
Step 3: Do application-specific trade-offs
consistency vs. cost consistency vs. availability consistency à la carte (levels of consistency)
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
16
Experiments and Results
G
l St d i th t d ff i t f
Goal: Studying the trade-offs in terms of
consistency, latency and cost ($)
We used a sub-set of the TPC-W benchmark
(models a bookstore scenario)
All experiments were done with a complex
customer transaction involving the following g g steps:
a) retrieve the customer record from the database; ) b) search for six specific products; c)
place orders for three of the six products.
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
17
Running Time per Transaction [secs]
Avg Max Avg. Max. Naïve (Shared-Disk) 11.3 12.1 Basic (Eventual Consistency) 4.0 5.9 as c ( e tua Co s ste cy) 5 9 Monotonicity 4.0 6.8 Atomicity + Monotonicity 2.8 4.6
Doesn’t include checkpointing (asynchronously) Every transaction simulates around 12 clicks Every transaction simulates around 12 clicks Time is less than a sec. per click Time is independent of the number of clients
It's fast. Not 15K-SCSI-RAID0-fast, but internet-
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
18
It s fast. Not 15K SCSI RAID0 fast, but internet latency-fast.
Cost per 1000 Transactions ($)
Step1: Step2: Checkpoint Total Step1: Commit Step2: Checkpoint + Atomic Queue Total Naïve (Shared-Disk) 0.15 0.15 Basic (Eventual Consistency) 0.7 1.1 1.8 Monotonicity 0.7 1.4 2.1 At i it M t i it 0 3 2 6 2 9 Atomicity + Monotonicity 0.3 2.6 2.9
Interaction with SQS becomes expensive For a bookstore, a transactional cost of about 3
milli-dollars (i.e., 0.3 cents)
Especially updates have a big influence on the
cost
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
19
Not cheap, but in many scenarios affordable
Summary and Future Work
A
hit t ll t t li
Architecture allows transparent scaling:
No need to change code, hardware,...
Consistency is a goal, not a constraint:
Consistency à la carte for your applications
Amazon’s WebServices are a viable candidate
for many Web 2.0 and interactive applications y pp
Future work:
SimpleDB as the main index SimpleDB as the main index EC2 as application server Further studies of stronger consistency protocols
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
Further studies of stronger consistency protocols
20
Thank you for your interest Thank you for your interest Questions?
Contact: tim.kraska@inf.ethz.ch
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
21
Cost per 1000 Transacts., Various Checkpoint Intervals p
June 18, 2008Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch
38