Bigtable A Distributed Storage System for Structured Data - - PowerPoint PPT Presentation

bigtable
SMART_READER_LITE
LIVE PREVIEW

Bigtable A Distributed Storage System for Structured Data - - PowerPoint PPT Presentation

Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li Saturday, September 21, 13 References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University


slide-1
SLIDE 1

A Distributed Storage System for Structured Data

Bigtable

Presenter: Yunming Zhang Conglong Li

Saturday, September 21, 13

slide-2
SLIDE 2

References

SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University of Washington

2

Saturday, September 21, 13

slide-3
SLIDE 3

Motivation

Lots of (semi) structured data at Google URLs Contents, crawl metadata, links Per-user data: User preference settings, search results Scale is large Billions of URLs, hundreds of million of users, Existing Commercial database doesn’t meet the requirements

3

Saturday, September 21, 13

slide-4
SLIDE 4

Store and manage all the state reliably and efficiently Allow asynchronous processes to update different pieces of data continuously Very high read/write rates Efficient scans over all or interesting subsets of data Often want to examine data changes over time

Goals

4

Saturday, September 21, 13

slide-5
SLIDE 5

BigTable vs. GFS

GFS provides raw data storage We need: More sophisticated storage Key - value mapping Flexible enough to be useful Store semi-structured data Reliable, scalable, etc.

5

Saturday, September 21, 13

slide-6
SLIDE 6

BigTable

Bigtable is a distributed storage system for managing large scale structured data Wide applicability Scalability High performance High availability

6

Saturday, September 21, 13

slide-7
SLIDE 7

Overview

Data Model API Implementation Structures Optimizations Performance Evaluation Applications Conclusions

7

Saturday, September 21, 13

slide-8
SLIDE 8

Data Model

Sparse Sorted Multidimensional

8

Saturday, September 21, 13

slide-9
SLIDE 9

Cell

Contains multiple versions of the data Can locate a data using row key, column key and a time stamp Treats data as uninterpreted array of bytes that allow clients to serialize various forms of structured and semi-structured data Supports automatic garbage collection per column family for management of versioned data

9

Saturday, September 21, 13

slide-10
SLIDE 10

Store and manage all the state reliably and efficiently Allow asynchronous processes to update different pieces of data continuously Very high read/write rates Efficient scans over all or interesting subsets of data Often want to examine data changes over time

Goals

10

Saturday, September 21, 13

slide-11
SLIDE 11

Row

Row key is an arbitrary string Access to column data in a row is atomic Row creation is implicit upon storing data Rows ordered lexicographically Rows close together lexicographically usually reside

  • n one or a small number of machines

11

Saturday, September 21, 13

slide-12
SLIDE 12

Columns

Columns are grouped into Column Families: family:optional_qualifier Column family Has associated type information Usually of the same type

12

Saturday, September 21, 13

slide-13
SLIDE 13

Overview

Data Model API Implementation Structures Optimizations Performance Evaluation Applications Conclusions

13

Saturday, September 21, 13

slide-14
SLIDE 14

API

Metadata operations Create/delete tables, column families, change metadata, modify access control list Writes ( atomic ) Set (), DeleteCells(), DeleteRow() Reads Scanner: read arbitrary cells in a BigTable

14

Saturday, September 21, 13

slide-15
SLIDE 15

Overview

Data Model API Implementation Structures Optimizations Performance Evaluation Applications Conclusions

15

Saturday, September 21, 13

slide-16
SLIDE 16

Tablets

Large tables broken into tablets at row boundaries Tablet holds contiguous range of rows Clients can often choose row keys for locality Aim for ~100MB to 200MB of data per tablet Serving machine responsible for ~100 tablets Fast recovery: 100 machine each pick up 1 tablet from failed machine Fine-grained load balancing: Migrate tablets away from overloaded machine

16

Saturday, September 21, 13

slide-17
SLIDE 17

Tablets and Splitting

Saturday, September 21, 13

slide-18
SLIDE 18

System Structure

Master Metadata operations Load balancing Keep track of live tablet servers Master failure Tablet server Accept read and write to data

18

Saturday, September 21, 13

slide-19
SLIDE 19

System Structure

Saturday, September 21, 13

slide-20
SLIDE 20

System Structure

read/write

Saturday, September 21, 13

slide-21
SLIDE 21

System Structure

Metadata operations

Saturday, September 21, 13

slide-22
SLIDE 22

Locating Tablets

3-level hierarchical lookup scheme for tablets Location is ip port of servers in META tables

22

Saturday, September 21, 13

slide-23
SLIDE 23

Tablet Representation and serving

Append only tablet log SSTable on GFS A Sorted map of string to string If you want to find a row data, all the data are contiguous Memtable write buffer When a read comes in, you have to merge SSTable data and uncommitted value.

23

Saturday, September 21, 13

slide-24
SLIDE 24

Tablet Representation and Serving

24

Saturday, September 21, 13

slide-25
SLIDE 25

Tablet Representation and Serving

25

Saturday, September 21, 13

slide-26
SLIDE 26

Compaction

Tablet state represented as a set of immutable compacted SSTable files, plus tail of log Minor compaction: When in-memory buffer fills up, it freezes the in-memory buffer and create a new SSTable Major compaction: Periodically compact all SSTables for tablet into new base SSTable on GFS Storage reclaimed from deletions at this point Produce new tables

26

Saturday, September 21, 13

slide-27
SLIDE 27

Overview

Data Model API Implementation Structures Optimizations Performance Evaluation Applications Conclusions

27

Saturday, September 21, 13

slide-28
SLIDE 28

Reliable system for storing and managing all the states Allow asynchronous processes to update different pieces of data continuously Very high read/write rates Efficient scans over all or interesting subsets of data Often want to examine data changes over time

Goals

28

Saturday, September 21, 13

slide-29
SLIDE 29

Locality Groups

Clients can group multiple column families together into a locality group A separate SSTable is generated for each locality group Enable more efficient read Can be declared to be in-memory

29

Saturday, September 21, 13

slide-30
SLIDE 30

Compression

Many opportunities for compression Similar values in columns and cells Within each SSTable for a locality group, encode compressed blocks Keep blocks small for random access Exploit fact that many values very similar

30

Saturday, September 21, 13

slide-31
SLIDE 31

Reliable system for storing and managing all the states Allow asynchronous processes to update different pieces of data continuously Very high read/write rates Efficient scans over all or interesting subsets of data Often want to examine data changes over time

Goals

31

Saturday, September 21, 13

slide-32
SLIDE 32

Commit log and recovery

Single commit log file per tablet server reduce the number of concurrent file writes to GFS Tablet Recovery redo points in log perform the same set of operations from last persistent state

32

Saturday, September 21, 13

slide-33
SLIDE 33

Overview

Data Model API Implementation Structures Optimizations Performance Evaluation Applications Conclusions

33

Saturday, September 21, 13

slide-34
SLIDE 34

Performance evaluation

Test Environment Based on a GFS with 1876 machines 400 GB IDE hard drives in each machine Two-level tree-shaped switched network Performance Tests Random Read/Write Sequential Read/Write

34

Saturday, September 21, 13

slide-35
SLIDE 35

Single tablet-server performance

Random reads is the slowest Transfer 64 KB SSTable over GFS to read 1000 byte Random and sequential writes perform better Append writes to server to a single commit log Group commit

35

Saturday, September 21, 13

slide-36
SLIDE 36

Performance Scaling

Performance didn’t scale linearly Load imbalance in multiple server configurations Larger data transfer overhead

36

Saturday, September 21, 13

slide-37
SLIDE 37

Overview

Data Model API Implementation Structures Optimizations Performance Evaluation Applications Conclusions

37

Saturday, September 21, 13

slide-38
SLIDE 38

Google Analytics

A service that analyzes traffic patterns at web sites Raw Click Table Row for each end-user session Row key is (website name, time) Summary Table Extracts recent session data using MapReduce jobs

38

Saturday, September 21, 13

slide-39
SLIDE 39

Google Earth

Use one table for preprocessing and one for serving Different latency requirements (disk vs memory) Each row in the imagery table represents a single geographic segment Column family to store data source One column for each raw image Very sparse

39

Saturday, September 21, 13

slide-40
SLIDE 40

Personalized Search

Row key is a unique userid A column family for each type of user action Replicated across Bigtable clusters to increase availability and reduce latency

40

Saturday, September 21, 13

slide-41
SLIDE 41

Conclusions

Bigtable provides a high scalability, high performance, high availability and flexible storage for structured data. It provides a low level read / write based interface for

  • ther frameworks to build on top of it

It has enabled Google to deal with large scale data efficiently

41

Saturday, September 21, 13