Using Rust to Build a Distributed Transactional Key-Value Database - - PowerPoint PPT Presentation

using rust to build a distributed transactional key value
SMART_READER_LITE
LIVE PREVIEW

Using Rust to Build a Distributed Transactional Key-Value Database - - PowerPoint PPT Presentation

Using Rust to Build a Distributed Transactional Key-Value Database LiuTang | tl@pingcap.com About me Chief Architect at PingCAP TiDB and TiKV Open source projects LedisDB go-mysql go-mysql-elasticsearch


slide-1
SLIDE 1

Using Rust to Build a Distributed Transactional Key-Value Database

LiuTang | tl@pingcap.com

slide-2
SLIDE 2

About me

  • Chief Architect at PingCAP
  • TiDB and TiKV
  • Open source projects

○ LedisDB ○ go-mysql ○ go-mysql-elasticsearch ○ rust-prometheus ○ ...

slide-3
SLIDE 3

Agenda

  • Introduction
  • Hierarchy

○ Storage ○ Raft ○ Transaction ○ RPC Framework ○ Monitor ○ Test

  • Combine them all
slide-4
SLIDE 4

When we want to build a distributed transactional key-value database...

slide-5
SLIDE 5

Performance ACID Scalability Stability Consistency

HA

Others…

slide-6
SLIDE 6

A High Building, A Low Foundation

slide-7
SLIDE 7

Language

slide-8
SLIDE 8
slide-9
SLIDE 9

Let’s start from scratch!!!

slide-10
SLIDE 10
slide-11
SLIDE 11

Memory Table Immutable Memory Table SST SST SST …... SST SST WAL Flush SST …... SST Compaction Memory Disk Level 0 Level 1 Level 2 Info Log Manifest Current

RocksDB

slide-12
SLIDE 12

https://github.com/pingcap/rust-rocksdb

slide-13
SLIDE 13
slide-14
SLIDE 14

Raft

a = 1 b = 2 a = 1 b = 2 State Machine Log Raft Module Client a = 1 b = 2 a = 1 b = 2 State Machine Log Raft Module a = 1 b = 2 a = 1 b = 2 State Machine Log Raft Module

slide-15
SLIDE 15

Multi-Raft

Region 1 Region 2 Region 3 Region 1 Region 2 Region 3 Region 1 Region 2 Region 3 Raft Group Raft Group Raft Group A - B B - C C - D Key Space

slide-16
SLIDE 16

Multi-Raft - Scalability

Region 1 Region 2 Region 1 Region 2 Region 1 Region 2 A B C D

slide-17
SLIDE 17

Multi-Raft - Scalability

Raft ConfChange - AddNode Region 1 Region 2 Region 1 Region 2 Region 1 Region 2 A B C D Region 2

slide-18
SLIDE 18

Multi-Raft - Scalability

Raft ConfChange - RemoveNode Region 1 Region 2 Region 1 Region 2 Region 1 A B C D Region 2

slide-19
SLIDE 19

https://github.com/pingcap/raft-rs

slide-20
SLIDE 20

Transaction

slide-21
SLIDE 21

Transaction

let mut txn = store.begin() let value1 = txn.get(region1_key) let value2 = txn.get(region2_key) // do something with value txn.set(region1_key, new_value1) txn.set(region2_key, new_value2) txn.commit() // or txn.rollback()

How to keep consistency crossing multi-Raft Groups?

slide-22
SLIDE 22

Transaction

1. Inspired by Google Percolator 2. Optimized Two Phase Commit (2 PC) 3. Multiversion Concurrency Control (MVCC) 4. Snapshot Isolation 5. Optimistic Transaction

slide-23
SLIDE 23
slide-24
SLIDE 24

gRPC

  • Mode

○ Unary ○ Client streaming ○ Server streaming ○ Duplex streaming

  • Using Futures to wrap the asynchronous C gRPC API

let f = unary(service, method, request); let resp = f.wait();

slide-25
SLIDE 25

https://github.com/pingcap/grpc-rs

slide-26
SLIDE 26
slide-27
SLIDE 27

Prometheus

  • Type

○ Counter ○ Gauge ○ Histogram

lazy_static! { static ref HTTP_COUNTER: Counter = register_counter!( "http_request_total", "Total number of HTTP request." ).unwrap(); } HTTP_COUNTER.inc();

slide-28
SLIDE 28

https://github.com/pingcap/rust-prometheus

slide-29
SLIDE 29

Testing

slide-30
SLIDE 30

Testing - Failure Injection

// Ingest a failure fn function_foo() { fail_point!("foo"); } // Run and Trigger the failure FAILPOINTS=foo=panic cargo run

slide-31
SLIDE 31

https://github.com/pingcap/fail-rs

slide-32
SLIDE 32

Architecture

slide-33
SLIDE 33

Architecture

RocksDB Raft MVCC Txn API RocksDB Raft MVCC Txn API RocksDB Raft MVCC Txn API Client gRPC Prometheus

slide-34
SLIDE 34

https://github.com/pingcap/tikv

slide-35
SLIDE 35

Beyond TiKV

slide-36
SLIDE 36

A Distributed Relational Database

slide-37
SLIDE 37

TiDB

Applications MySQL Drivers(e.g. JDBC) TiDB TiKV

MySQL Protocol RPC

slide-38
SLIDE 38

A Distributed Analytical Database

slide-39
SLIDE 39

TiSpark

Worker Spark Driver Job Spark Cluster

TiKV

RPC Worker Worker

slide-40
SLIDE 40

Hybrid Transactional/Analytical Processing Database

slide-41
SLIDE 41

TiDB TiDB Worker Spark Driver TiKV Cluster (Storage) Meta data TiKV TiKV TiKV Data location Job TiSpark API TiKV TiDB TSO/Data location Worker Worker Spark Cluster TiDB Cluster TiDB

... ... ...

API

PD PD PD

PD Cluster TiKV TiKV TiDB

slide-42
SLIDE 42

Thank you! https://github.com/pingcap/tidb https://github.com/pingcap/tikv We are hiring… @China @Silicon Valley @Home