Bigtable David Wyrobnik, MEng Overview What is Bigtable? Data - PowerPoint PPT Presentation

Jun 05, 2023 •189 likes •350 views

Bigtable David Wyrobnik, MEng Overview What is Bigtable? Data Model API Building Blocks Implementation What is Bigtable (high level) Distributed storage system for structured data - title of paper

Bigtable David Wyrobnik, MEng
Overview ● What is Bigtable? ● Data Model ● API ● Building Blocks ● Implementation
What is Bigtable (high level) ● “Distributed storage system for structured data” - title of paper ● “BigTable is a compressed, high performance, and proprietary data storage system built on Google File System, Chubby Lock Service, SSTable (log- structured storage like LevelDB) and a few other Google technologies.” - wikipedia ● “A Bigtable is a sparse, distributed, persistent multidimensional sorted map” - paper
Data Model
Data Model ● (row:string, column:string, time:int64) → array of bytes
Data Model continued ● Timestamps can be assigned automatically (“real time”) or by client ● Versioned data management, two per-column-family settings for garbage- collection ○ last n versions of a cell should be kept only new-enough versions kept (e.g. only values that were written in the last seven days) ○
API
API ● Functions for creating and deleting tables and column families ○ ● Functions for changing ○ clusters, table, and column family metadata (such as control rights) ● Write, delete, and lookup values in individual rows ● Iterate over subset of data in table ● Single-row transactions → perform atomic read-modify-write sequences ● No general transactions across rows, but supports batching writes across rows ● Bigtable can be used with MapReduce (common use case)
Building Blocks and Implementation
Building Blocks ● Google-File-System (GFS) to store log and data files. ● SSTable file format. ● Chubby as a lock service ● Bigtable uses Chubby ○ to ensure at most one active master exists ○ to store bootstrap location of Bigtable data ○ to discover tablet servers ○ to store Bigtable schema information (column family info for each table) ○ to store access control lists
Implementation ● Three major components: library that is linked into every client ○ ○ one master server many tablet servers ○ ● Master mainly responsible for assigning tablets to tablet servers ● Tablet servers can be added or removed dynamically ● Tablet server store typically 10-1000 tablets ● Tablet server handle read and writes and splitting of tablets that are too large ● Client data does not move through master.
Tablet Location
Tablet Assignment ● Master keeps track of live tablet servers, current assignments, and of unassigned tablets ● Master assigns unassigned tablets to tablet servers by sending a tablet load request ● Tablet servers are linked to files in Chubby directory (servers directory) ● When new master starts: ○ Acquires unique master lock in Chubby ○ Scans live tablet servers ○ Gets list of tablets from each tablet server, to learn which tablets are assigned ○ Scans METADATA table to learn set of existing tablets → adds unassigned tablets to list
Tablet Serving
Consistency ● Bigtable has a strong consistency model, since operations on rows are atomic and tablets are only served by one tablet server at a time
Discussion

Recommend

BigTable CS 452 BigTable In the early 2000s, Google had way more data than anybody else did

BigTable CS 452 BigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldnt scale Want something better than a filesystem (GFS) BigTable optimized for: - Lots of data, large infrastructure -

662 views • 37 slides

Bigtable, Spanner and Flat Datacenter Storage by Onur Karaman and Karan Parikh Introducing

Bigtable, Spanner and Flat Datacenter Storage by Onur Karaman and Karan Parikh Introducing Bigtable Why Bigtable? Store lots of data Scalable Simple yet powerful data model Flexible workloads: high throughput batch jobs to

666 views • 47 slides

Accumulo Extensions to Googles Bigtable Apache Accumulo Design Intro to Bigtable

Accumulo Adam Fuchs Design Drivers Accumulo Extensions to Googles Bigtable Apache Accumulo Design Intro to Bigtable Iterators FATE Major Compaction Design Adam Fuchs Patterns F` n National Security Agency Computer and

774 views • 54 slides

OpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky,

OpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice Lead - @zburivsky Christos Soulios, Big Data Architect - @c_soulios Pythian specializes in design, implementation, and management

748 views • 25 slides

Distributed Transactions Dan Ports, CSEP 552 Today Bigtable (from last week) Overview of

Distributed Transactions Dan Ports, CSEP 552 Today Bigtable (from last week) Overview of transactions Two approaches to adding transactions to Bigtable: MegaStore and Spanner Latest research: TAPIR Bigtable stores

1.13k views • 55 slides

Online Bigtable merge compaction Neal E. Young 1 Claire Mathieu Carl Staelin Arman Yousefia

Online Bigtable merge compaction Neal E. Young 1 Claire Mathieu Carl Staelin Arman Yousefia CNRS Paris Google Haifa UC Riverside UCLA Northeastern University, September 17, 2015 1 funded by faculty re$earch award BIGTABLE data storage

908 views • 57 slides

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services Bigtable: A Distributed Storage System for Structured Data Presenter: Arpitha Anand What is BigTable? Bigtable is a compressed, highly distributed,

337 views • 31 slides

Lecture: The Google Bigtable

Lecture: The Google Bigtable h#p://research.google.com/archive/bigtable.html 10/09/2014 Romain Jaco3n romain.jaco7n@orange.fr Agenda Introduc3on Data model API

628 views • 43 slides

The Google Storage Stack (Chubby, GFS, BigTable) Dan Ports, CSEP 552 Today Three

The Google Storage Stack (Chubby, GFS, BigTable) Dan Ports, CSEP 552 Today Three real-world systems from Google GFS: large-scale storage for bulk data BigTable: scalable storage of structured data Chubby: coordination to

821 views • 57 slides

Bigtable: A Distributed Storage System for Structured Data Alvanos Michalis April 6, 2009

Outline Introduction Design Implementation Results Conclusions Bigtable: A Distributed Storage System for Structured Data Alvanos Michalis April 6, 2009 Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data Outline

764 views • 25 slides

Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach,

Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 2006 Adapted by S. Sudarshan from a talk by Erik Paulson, UW

902 views • 23 slides

Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li

Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li Saturday, September 21, 13 References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University

772 views • 41 slides

BigTable: A System for Distributed Structured Storage Jeff Dean Joint work with:

BigTable: A System for Distributed Structured Storage Jeff Dean Joint work with: Mike Burrows, Tushar Chandra, Fay Chang, Mike Epstein, Andrew Fikes, Sanjay Ghemawat, Robert Griesemer, Bob Gruber, Wilson Hsieh, Josh Hyman, Alberto

701 views • 36 slides

Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay

Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. Presented by: Robina Bhatia

499 views • 20 slides

Dynamo & Bigtable CSCI 2270, Spring 2011 Irina Calciu Zikai Wang Dynamo Amazon's highly

Dynamo & Bigtable CSCI 2270, Spring 2011 Irina Calciu Zikai Wang Dynamo Amazon's highly available key-value store Amazon's E-commerce Platform Hundreds of services (recommendations, order fulfillment, fraud detection, etc.) Millions of

461 views • 44 slides

Bigtable: A Distributed Storage System for Structured Data

Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,

290 views • 9 slides

Factors influencing teachers' professional use of ICT in primary and secondary schools in Spain

Factors influencing teachers' professional use of ICT in primary and secondary schools in Spain Georgeta Ion1; David Rodrguez1 ; Sergi Fbregues 2; Julio Meneses2 1.Universitat Autnoma de Barcelona; 2. Universitat Oberta de Catalunya

462 views • 15 slides

Secondary reads: the good and the bad Bartomiej Noga Agenda Read Preference

Secondary reads: the good and the bad Bartomiej Noga Agenda Read Preference configuration Lagging secondaries and stale or missing/duplicated data What queries can be safely run on secondaries? Improving read throughput:

719 views • 47 slides

FOSDEM 2020 PostgreSQL devroom Brussels ALEXANDER KUKUSHKIN 02-02-2020 Put images in the grey

Please write title, subtitle and speaker name in all capital letters PostgreSQL on K8S at Zalando: Two years in production FOSDEM 2020 PostgreSQL devroom Brussels ALEXANDER KUKUSHKIN 02-02-2020 Put images in the grey dotted box

553 views • 54 slides

File Systems and Storage Marco Serafini COMPSCI 532 Lecture 14 2 Why GFS? Store the

File Systems and Storage Marco Serafini COMPSCI 532 Lecture 14 2 Why GFS? Store the web and other very large datasets Peculiar requirements Huge files Files can span multiple servers Coarse granularity blocks to

526 views • 25 slides

& Why We Need 24/6 (Sessions 1 & 2) 1. How can keeping a weekly Stop Day help heal you?

Our 24/7 World & Why We Need 24/6 (Sessions 1 & 2) 1. How can keeping a weekly Stop Day help heal you? Heal families? Heal marriages? Heal congregations? 2. If the definition of rest is figuring out what work means for youand not

163 views • 3 slides

Patients with Acute or Chronic Non-Cancer Pain Applicant Town Hall Cycle 3, 2016 November 3,

Strategies to Prevent Unsafe Opioid Prescribing in Primary Care among Patients with Acute or Chronic Non-Cancer Pain Applicant Town Hall Cycle 3, 2016 November 3, 2016 Agenda Research Strategy Patient Engagement and Engagement Plan

730 views • 55 slides

Flexible parametric joint modelling of longitudinal and survival data Workshop on Flexible Models

Background stjm Multivariate JMs Delayed entry JMs in large datasets Summary References Flexible parametric joint modelling of longitudinal and survival data Workshop on Flexible Models for Longitudinal and Survival Data with Applications

1.09k views • 98 slides

Segmentation of Counting Processes and Dynamical Models PhD Thesis Defense Mokhtar Zahdi Alaya

Segmentation of Counting Processes and Dynamical Models PhD Thesis Defense Mokhtar Zahdi Alaya June 27, 2016 Plan Motivations 1 Learning the intensity of time events with change-points 2 Piecewise constant intensity Estimation procedure

742 views • 56 slides