bigtable
play

Bigtable David Wyrobnik, MEng Overview What is Bigtable? Data - PowerPoint PPT Presentation

Bigtable David Wyrobnik, MEng Overview What is Bigtable? Data Model API Building Blocks Implementation What is Bigtable (high level) Distributed storage system for structured data - title of paper


  1. Bigtable David Wyrobnik, MEng

  2. Overview ● What is Bigtable? ● Data Model ● API ● Building Blocks ● Implementation

  3. What is Bigtable (high level) ● “Distributed storage system for structured data” - title of paper ● “BigTable is a compressed, high performance, and proprietary data storage system built on Google File System, Chubby Lock Service, SSTable (log- structured storage like LevelDB) and a few other Google technologies.” - wikipedia ● “A Bigtable is a sparse, distributed, persistent multidimensional sorted map” - paper

  4. Data Model

  5. Data Model ● (row:string, column:string, time:int64) → array of bytes

  6. Data Model continued ● Timestamps can be assigned automatically (“real time”) or by client ● Versioned data management, two per-column-family settings for garbage- collection ○ last n versions of a cell should be kept only new-enough versions kept (e.g. only values that were written in the last seven days) ○

  7. API

  8. API ● Functions for creating and deleting tables and column families ○ ● Functions for changing ○ clusters, table, and column family metadata (such as control rights) ● Write, delete, and lookup values in individual rows ● Iterate over subset of data in table ● Single-row transactions → perform atomic read-modify-write sequences ● No general transactions across rows, but supports batching writes across rows ● Bigtable can be used with MapReduce (common use case)

  9. Building Blocks and Implementation

  10. Building Blocks ● Google-File-System (GFS) to store log and data files. ● SSTable file format. ● Chubby as a lock service ● Bigtable uses Chubby ○ to ensure at most one active master exists ○ to store bootstrap location of Bigtable data ○ to discover tablet servers ○ to store Bigtable schema information (column family info for each table) ○ to store access control lists

  11. Implementation ● Three major components: library that is linked into every client ○ ○ one master server many tablet servers ○ ● Master mainly responsible for assigning tablets to tablet servers ● Tablet servers can be added or removed dynamically ● Tablet server store typically 10-1000 tablets ● Tablet server handle read and writes and splitting of tablets that are too large ● Client data does not move through master.

  12. Tablet Location

  13. Tablet Assignment ● Master keeps track of live tablet servers, current assignments, and of unassigned tablets ● Master assigns unassigned tablets to tablet servers by sending a tablet load request ● Tablet servers are linked to files in Chubby directory (servers directory) ● When new master starts: ○ Acquires unique master lock in Chubby ○ Scans live tablet servers ○ Gets list of tablets from each tablet server, to learn which tablets are assigned ○ Scans METADATA table to learn set of existing tablets → adds unassigned tablets to list

  14. Tablet Serving

  15. Consistency ● Bigtable has a strong consistency model, since operations on rows are atomic and tablets are only served by one tablet server at a time

  16. Discussion

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend