CockroachDBs Survivability Model Scalable, Survivable, Consistent, - - PowerPoint PPT Presentation

cockroachdb s survivability model
SMART_READER_LITE
LIVE PREVIEW

CockroachDBs Survivability Model Scalable, Survivable, Consistent, - - PowerPoint PPT Presentation

CockroachDBs Survivability Model Scalable, Survivable, Consistent, SQL presented by Marc Berhault / Engineer @cockroachdb CockroachDB: Make Data Easy Scalable Survivable Strongly Consistent SQL And... Open Source


slide-1
SLIDE 1

@cockroachdb

Scalable, Survivable, Consistent, SQL

CockroachDB’s Survivability Model

presented by Marc Berhault / Engineer

slide-2
SLIDE 2

@cockroachdb

■ Scalable ■ Survivable ■ Strongly Consistent ■ SQL And... ■ Open Source

CockroachDB: Make Data Easy

slide-3
SLIDE 3

@cockroachdb

Architecture: ■ SQL layer ■ Transactions ■ Sharding ■ Replication Survivability: ■ Rebalancing ■ Repairs

Agenda

slide-4
SLIDE 4

@cockroachdb

Architecture

slide-5
SLIDE 5

@cockroachdb

Architecture (high-level)

SQL Store Range Transactional KV Distribution Replication Range Range Store Range Range Range Storage Store Range Range Range Store Range Range Range SQL SQL Storage Storage * * Abstraction stack: Node 1 Node 2 In the network:

slide-6
SLIDE 6

@cockroachdb

CREATE TABLE inventory ( id INTEGER PRIMARY KEY, name VARCHAR, quantity INTEGER, INDEX name_index (name)); INSERT INTO inventory VALUES (1, “Apple”, 3);

SQL

SQL Transactional KV Distribution Replication

slide-7
SLIDE 7

@cockroachdb

■ Tables ■ Rows ■ Columns ■ Indexes

SQL: Data model

id name quantity 1 Apple 3 2 Orange 12 3 Cherry 5 4 Banana 7 inventory name id Apple 1 Banana 4 Cherry 3 Orange 2 name_index

slide-8
SLIDE 8

@cockroachdb

INSERT INTO inventory VALUES (1, “Apple”, 3);

SQL: Key anatomy

Key: /<table>/<index>/<key>/<column> Value /inventory/primary/ 1/name Apple /inventory/primary/ 1/quantity 3 Key: /<table>/<index>/<key> Value /inventory/name_index/Apple 1 inventory name_index

slide-9
SLIDE 9

@cockroachdb

■ Update all keys atomically ■ Track across multiple commands ■ Retry when necessary

Transactional KV: consistency

SQL Transactional KV Distribution Replication

slide-10
SLIDE 10

@cockroachdb

■ CockroachDB uses optimistic concurrency control for lock-free transactions ■ In case of conflict: the losing transaction restarts

Optimistic Concurrency

slide-11
SLIDE 11

@cockroachdb

Distribution: scalability

SQL Transactional KV Distribution Replication

■ Route KV commands to the appropriate shards ■ Split batches if necessary

slide-12
SLIDE 12

@cockroachdb

Sharding: Index Each shard holds a contiguous span of the keyspace

Ø-lem

apricot banana blueberry cherry grape

lem-pea

lemon lime mango melon

  • range

pea-∞

peach pear pineapple raspberry strawberry

slide-13
SLIDE 13

@cockroachdb

Sharding: Index An index maps from key to range ID

Ø-lem

apricot banana blueberry cherry grape

lem-pea

lemon lime mango melon

  • range

pea-∞

peach pear pineapple raspberry strawberry Ø-lem lem-pea pea-∞

shard index

slide-14
SLIDE 14

@cockroachdb

Sharding: Split

Ø-lem

apricot banana blueberry cherry grape

lem-pea

lemon lime mango melon

  • range

pea-str

peach pear pineapple raspberry Ø-lem lem-pea

shard index str-∞

strawberry tamarillo tamarind pea-∞ pea-str

Split when a shard is too large

slide-15
SLIDE 15

@cockroachdb

Replication: survivability

SQL Transactional KV Distribution Replication

■ Each range is replicated to three

  • r more nodes

■ One replica of each range is the leader

slide-16
SLIDE 16

@cockroachdb

Replication

Node 1

Range 1 Range 2

Node 2

Range 1 Range 2

Node 3

Range 1

Node 4

Range 2

Range 3 Range 3 Range 2 Range 3

■ Each set of replicas is a Raft group ■ Consistency provided by quorum

slide-17
SLIDE 17

@cockroachdb

Replication: Node storage

■ Data is stored locally in RocksDB ■ Embedded KV database ■ Provides atomic writes to multiple keys ■ Supports ordered scans

slide-18
SLIDE 18

@cockroachdb

Reliability

slide-19
SLIDE 19

@cockroachdb

■ Symmetric nodes ■ Auto-balancing ■ Self-healing

Reliability

slide-20
SLIDE 20

@cockroachdb

Reliability: Rebalancing

Node 1

Range 1 Range 2

Node 2

Range 1 Range 2

Node 3

Range 1

Range 2

Range 3 Range 3 Range 2 Range 3

slide-21
SLIDE 21

@cockroachdb

Adding a new (empty) node

Reliability: Rebalancing

Node 1

Range 1 Range 2

Node 2

Range 1 Range 2

Node 3

Range 1

Node 4

Range 2

Range 3 Range 3 Range 2 Range 3

slide-22
SLIDE 22

@cockroachdb

A new replica is allocated, data is copied.

Reliability: Rebalancing

Node 1

Range 1 Range 2

Node 2

Range 1 Range 2

Node 3

Range 1

Node 4

Range 2

Range 3 Range 3 Range 2 Range 3 Range 3

slide-23
SLIDE 23

@cockroachdb

The new replica is made live, replacing another.

Reliability: Rebalancing

Node 1

Range 1 Range 2

Node 2

Range 1 Range 2

Node 3

Range 1

Node 4

Range 2

Range 3 Range 3 Range 2 Range 3 Range 3

slide-24
SLIDE 24

@cockroachdb

The old (inactive) replica is deleted.

Reliability: Rebalancing

Node 1

Range 1 Range 2

Node 2

Range 1 Range 2

Node 3

Range 1

Node 4

Range 2

Range 3 Range 3 Range 2 Range 3

slide-25
SLIDE 25

@cockroachdb

Process continues until nodes are balanced.

Reliability: Rebalancing

Node 1

Range 1 Range 2 Range 2

Node 2 Node 3

Range 1

Node 4

Range 2

Range 3 Range 1 Range 3 Range 2 Range 3

slide-26
SLIDE 26

@cockroachdb

Reliability: Recovery

Node 1

Range 1 Range 2 Range 2

Node 2 Node 3

Range 1

Node 4

Range 2

Range 3 Range 1 Range 3 Range 2 Range 3

slide-27
SLIDE 27

@cockroachdb

Losing a node causes recovery of its replicas.

Reliability: Recovery

Node 1

Range 1 Range 2 Range 2

Node 2 Node 3

Range 1

Node 4

Range 2

Range 3 Range 1 Range 3 Range 2 Range 3

X

slide-28
SLIDE 28

@cockroachdb

A new replica gets created on an existing node.

Reliability: Recovery

Node 1

Range 1 Range 2 Range 2

Node 2 Node 3

Range 1

Node 4

Range 2

Range 3 Range 1 Range 3 Range 2 Range 3

X

Range 1 Range 3

slide-29
SLIDE 29

@cockroachdb

Once at full replication, the old replicas are forgotten.

Reliability: Recovery

Node 1

Range 1 Range 2 Range 2

Node 3

Range 1

Node 4

Range 2

Range 3 Range 2 Range 3 Range 1 Range 3

slide-30
SLIDE 30

@cockroachdb

Zone configuration ■ Replication factor (default 3) ■ Geographical location (eg: 2 in Europe, 1 in US) ■ Machine attributes (ssd vs disk)

slide-31
SLIDE 31

@cockroachdb

Status: BETA

slide-32
SLIDE 32

@cockroachdb

Ready for development testing Roadmap: ■ Stability ■ Performance ■ Distributed SQL ■ Optimized JOINs

Status: Beta

slide-33
SLIDE 33

@cockroachdb

github.com/cockroachdb/cockroach CockroachLabs.com Gitter: cockroachdb @cockroachdb Thank You