building a database on s3 building a database on s3
play

Building a Database on S3 Building a Database on S3 Matthias - PowerPoint PPT Presentation

Building a Database on S3 Building a Database on S3 Matthias Brantner , Daniela Florescu + , David Graf , Donald Kossmann Tim Kraska Donald Kossmann , Tim Kraska Systems Group, ETH Zurich 28msec Inc. Oracle +


  1. Building a Database on S3 Building a Database on S3 Matthias Brantner � , Daniela Florescu + , David Graf � , Donald Kossmann � � Tim Kraska � Donald Kossmann � � , Tim Kraska � Systems Group, ETH Zurich � 28msec Inc. � Oracle + September 25, 2007

  2. Motivation � � Building a web page starting a blog and Building a web page, starting a blog, and making both searchable for the public have become a commodity � But providing your own service (and to get rich) still comes at high cost: � Have the right (business) idea Have the right (business) idea Run your own web-server and database � Maintain the infrastructure � Keep the service up 24 x 7 K th i 24 7 � Backup the data � � Tune the system if the service is used more often A d th And then comes the Digg-Effect th Di Eff t June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 2

  3. Requirements for DM on the Web � Scalability Scalability � response time independent of number of clients � No administration � No administration � „outsource“ patches, backups, fault tolerance � 100 percent read + write availability 100 t d it il bilit � no client is ever blocked under any circumstances � Cost ($$$) � get cheaper every year, leverage new technology � pay as you go along, no investment upfront June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 3

  4. Utility Computing as a solution � � � Scalability Scalability � response time independent of number of clients � � � No administration � No administration � „outsource“ patches, backups, fault tolerance � no client is ever blocked under any circumstances � � � 100 percent read + write availability 100 t d it il bilit � Cost ($$$) � get cheaper every year, leverage new technology � � pay as you go along, no investment upfront ? ? Consistency: optimization goal, y p g , not constraint June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 3

  5. Utility Computing as a solution � � � Scalability Scalability � response time independent of number of clients � � � No administration � No administration � „outsource“ patches, backups, fault tolerance � no client is ever blocked under any circumstances � � � 100 percent read + write availability 100 t d it il bilit Consistency Cost How much consistency is How much does it cost? o uc does t cost � Cost ($$$) � get cheaper every year, leverage new technology � required by my application? � pay as you go along, no investment upfront ? ? Consistency: optimization goal, y p g , not constraint June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 3

  6. Amazon Web Services (AWS) � Most popular utility provider Most popular utility provider Gives us all necessary building blocks (Storage, CPU-cycles, etc.) � Other providers also appear on the market � � Amazon infrastructure services: Simple Storage Service (S3) Simple Queuing Service (SQS) • (Virtually) infinite store • (Virtually) infinite store • Message service • Message service • Costs: $0.15 per GB-month + transfer costs • Allows to exclusively receive a message ($0.1-$0.17 In/Out per GB) • Costs: $0.0001 per message sent + transfer costs Elastic Cloud Computing (EC2) SimpleDB • Virtual instance: 1-8 virtual cores (=1.0-2.5 Vi t l i t 1 8 i t l ( 1 0 2 5 • Basically a text-index B i ll t t i d GHz Opterons), 1.7-15 GB of memory, • Costs: $0.14 per Amazon SimpleDB 160GB-1690GB of instance storage machine hour consumed • Costs: $0.1-$0.8 per hour + transfer costs June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 4

  7. Plan of Attack � St � Step 1: Use S3 as a huge shared disk 1 U S3 h h d di k � leverage scalability, no admin features � Step 2: Allow concurrent access to shared disk in a distributed system � keep properties of a distributed system, maximize consistency � Step 3: Do application-specific trade-offs � consistency vs. cost � consistency vs. availability � consistency à la carte (levels of consistency) June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 5

  8. Plan of Attack � St � Step 1: Use S3 as a huge shared disk 1 U S3 h h d di k � leverage scalability, no admin features � Step 2: Allow concurrent access to shared disk in a distributed system � keep properties of a distributed system, maximize consistency � Step 3: Do application-specific trade-offs � consistency vs. cost � consistency vs. availability � consistency à la carte (levels of consistency) June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 6

  9. Shared-Disk Architecture C Client 1 / EC2 Cli t 1 / EC2 ould be Client M / EC2 comp Application Application ........ pletely e exec ........ Record Manager Record Manager cuted o on the ........ Page Manager Page Manager client n EC2 N 1 2 3 4 5 6 Page N Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 or S3 June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 7

  10. Problem: Eventual Consistency Client 1 / EC2 Client 2 / EC2 � Two clients update Application Application the same page the same page � Last update wins Record Manager Record Manager � Consistency problem Page Manager Page Manager Page Manager Page Manager C i t bl � Inconsistency between indexes and page i d d age N age 1 age 2 age 3 age 4 age 5 age 6 � Lost records � Lost updates Lost updates Pa Pa Pa Pa Pa Pa Pa S3 S3 June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 8

  11. Plan of Attack � St � Step 1: Use S3 as a huge shared disk 1 U S3 h h d di k � leverage scalability, no admin features � Step 2: Allow concurrent access to shared disk in a distributed system � keep properties of a distributed system, maximize consistency � Step 3: Do application-specific trade-offs � consistency vs. cost � consistency vs. availability � consistency à la carte (levels of consistency) June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 9

  12. Levels of Consistency [Tanenbaum] � Shared-Disk (Naïve approach) Shared Disk (Naïve approach) � No concurrency control at all � Eventual Consistency (Basic Protocol) � Eventual Consistency (Basic Protocol) � Updates become visible any time and will persist � No lost update on page level No lost update on page level � Atomicity � All or no updates of a transaction become visible � Monotonic reads, Read your writes, Monotonic writes, ... � Strong Consistency g y � database-style consistency (ACID) via OCC June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 10

  13. Levels of Consistency [Tanenbaum] � Shared-Disk (Naïve approach) Shared Disk (Naïve approach) � No concurrency control at all � Eventual Consistency (Basic Protocol) � Eventual Consistency (Basic Protocol) � Updates become visible any time and will persist � No lost update on page level No lost update on page level � Atomicity � All or no updates of a transaction become visible � Monotonic reads, Read your writes, Monotonic writes, ... � Strong Consistency g y � database-style consistency (ACID) via OCC June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 11

  14. Basic Protocol: Queues � One PU and Lock queue is O PU d L k i associated to each page Client 1 Client 2 Client M � Lock queues contain exactly Lock queues contain exactly one message (inserted directly after creating the queue) Que Lo eues � Commit to pages in two phases lock lock lock lock l k l k l k l k l lock k ck Pend Queues Q ing Update e age N age 1 age 2 age 3 age 4 S3 Pa Pa Pa Pa Pa June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 12

  15. Basic Protocol Step 1: Commit St 1 C it Client 1 Client 2 Client M Clients commit update log log log log log log log records to PU-Queues d t PU Q log log l log Que Lo eues l lock lock lock lock k l k l k l k l lock k ck Pend Q Queues ing Update e age N age 1 age 2 age 3 age 4 S3 Pa Pa Pa Pa Pa June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 12

  16. Basic Protocol Step 1: Commit St 1 C it Client 1 Client 2 Client M Clients commit update log log log log log log log records to PU-Queues d t PU Q log log l log Que Lo eues l lock lock lock lock k l k l k l k l lock k ck Pend Q Queues ing Update e age N age 1 age 2 age 3 age 4 S3 Pa Pa Pa Pa Pa June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 12

  17. Basic Protocol Step 1: Commit St 1 C it Client 1 Client 2 Client M Clients commit update records to PU-Queues d t PU Q � Commit of the transaction Que Lo eues � T Transaction is finished i i fi i h d lock lock lock lock l k l k l k l k lock l k ck Pend Queues Q ing Update log log log log log log log e age N age 1 age 2 age 3 age 4 S3 Pa Pa Pa Pa Pa June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend