Ceph & RocksDB
변일수(Cloud Storage팀)
Ceph & RocksDB (Cloud Storage ) Ceph Basics Placement Group - - PowerPoint PPT Presentation
Ceph & RocksDB (Cloud Storage ) Ceph Basics Placement Group PG#1 PG#2 PG#3 myobject mypool hash(myobject) = 4% 3(# of PGs) = 1 Target PG CRUSH PG#1 PG#2 PG#3 mypool OSD#1 OSD#3 OSD#12 Recovery PG#1 PG#2 PG#3
변일수(Cloud Storage팀)
myobject mypool
PG#1 PG#2 PG#3
hash(myobject) = 4% 3(# of PGs) = 1 ← Target PG
mypool
PG#1 PG#2 PG#3
OSD#1 OSD#3 OSD#12
mypool
PG#1 PG#2 PG#3
OSD#1 OSD#3 OSD#12
ObjectStore
Replication, ???, …
OSD
Peering, Heartbeat, …
FileStore BlueStore
https://www.scan.co.uk/products/4tb-toshiba-mg04aca400e-enterprise-hard-drive-35-hdd-sata-iii-6gb-s-7200rpm-128mb-cache-oem
https://ceph.com/community/new-luminous-bluestore/
Consistency is enforced here!
SSD
kv_committing pipe out_q
Request Ack write flusth
ROCKSDB
sync transaction
Shard finisher queue
be guaranteed.
Request SSTFile Logfile Memtable Transaction Log Flush JoinBatchGroup (leader) PreprocessWrite WriteToWAL MarkLogsSynced ExitAsBatchGroupLeader Write to memtable #1 Thread #2 Thread JoinBatchGroup AwaitState #3 Thread JoinBatchGroup AwaitState PreprocessWrite WriteToWAL leader LaunchParallelFollower MarkLogsSynced follower ExitAsBatchGroupFollower CompleteParallelWorker Concurrent write to memtable Group commit
10,000 20,000 30,000 40,000 50,000 60,000
1 shard 10 shard IOPS
Shard Scalability
WAL disableWAL PUT PUT PUT PUT WAL RocksDB
OSD Mon Mgr Rados RadosGW CephFS
Prepare Index Write Data Complete Index Put Object RADOS Index Object Data Object
k v k v k v k v k v k v
RocksDB
SSD
kv_committing pipe out_q
Request Ack write flusth
ROCKSDB
sync transaction
Shard finisher queue
bstore_shard_finishers = true
be guaranteed.
Stores" (ATC'19)