A Cloud-native Architecture for Replicated Data Services Hemant - - PowerPoint PPT Presentation

▶

Jul 23, 2023 450 likes •693 views

A Cloud-native Architecture for Replicated Data Services Hemant Saxena, Jeffery Pound University of Waterloo, SAP Labs Waterloo Outline Problem overview Solution overview Kafka Cassandra Evaluation 2 Problem overview Cloud

SLIDE 1

A Cloud-native Architecture for Replicated Data Services

Hemant Saxena, Jeffery Pound University of Waterloo, SAP Labs Waterloo

SLIDE 2

Outline

Problem overview
Solution overview

○ Kafka ○ Cassandra

Evaluation

SLIDE 3

Problem overview

➢

Cloud has become de facto standard for deploying applications

➢

However, applications designed for on-premise infrastructure find it challenging to leverage the Cloud storage efficiently, because:

○ Data replication for on-premise provides fault-tolerance (FT) and high availability (HA) ○ Whereas, Cloud storage already uses replication to provides FT and HA ○ Making application’s replication redundant resulting into additional storage cost

SLIDE 4

Typical replicated application on-premise

replica-set

client

Replicated application

SLIDE 5

Typical replicated application on Cloud

replica-set

client

Application-level

replication (replica-set)

Storage-level

replication

Resulting into

redundant replicas

Introducing

additional storage cost Replicated application Storage service

SLIDE 6

Problem overview

We ask the following research question...

How can we easily allow applications designed for on-premise infrastructure to efficiently leverage the Cloud storage?

SLIDE 7

Outline

Problem overview
Solution overview

○ Kafka ○ Cassandra

Evaluation

SLIDE 8

Naȉve solution

Have one replica (i.e. no

application-level replication)

Solves the problem of

redundant replication

But, it is prone to node
failure. Hence not highly

available.

replica-set

SLIDE 9

Contributions of this work

➢ We show how a well-known main-delta architecture can be used to leverage cloud storage efficiently

○ i.e. ensure no redundant replication ○ while maintaining the fault-tolerance and availability guarantees of the applications ➢

We show that incorporating main-delta architecture in existing on-premise applications is easy

○ by controlling how buffers are managed and flushed to storage ○ and it is compatible with the whole spectrum of replication strategies

SLIDE 10

Quick recap of main-delta architecture

➢

Originally designed for efficiently handling mixed read/update workloads

➢

Two parts

○ Static, read-only, read optimized main ○ Small, write-optimized delta ○ Deltas are merged with the main at regular intervals

SLIDE 11

Solution overview

Replicated local deltas,

maintained by application

But single shared main on

Cloud storage (which is fault-tolerant)

replica-set

M M M

SLIDE 12

Replicated local deltas,

maintained by application

But single shared main on

Cloud storage (which is fault-tolerant)

Solution overview

replica-set

How to merge the deltas?

M M M

SLIDE 13

Merging Deltas to Main

➢

Details are in how the delta is merged to the main such that

○ No data is lost from any deltas ○ And applications have same guarantees as on-premise deployment ➢

Delta-merge strategy depends on the replication strategy

○ Single primary node means single delta to merge ○ Multiple primary nodes means multiple deltas to merge

SLIDE 14

Classification of replication strategies

Request-handler replica-set

▪Write to primary, read from any: ▪Write to any, read from any (e.g. quorum): ▪Write to primary, read from primary:

Request-handler Request-handler replica-set replica-set 14

SLIDE 15

Case-study 1: Delta merge for single primary

Idea: In-memory buffers as

deltas, on-disk data as main.

Only the primary will merge its

delta to main. Other replicas will discard their deltas when they are full.

In case of primary node failure,

new primary node takes the responsibility of merging deltas.

replica-set

M M M

SLIDE 16

Case-study 2: Delta merge for quorum system

The memtable and sstables can

be easily leveraged as delta and main.

Deciding which node merges

the delta is tricky:

○ Each node can have different set

f updates

replica-set

M M M

SLIDE 17

Case-study 2: Delta merge for quorum system

Nodes flush their deltas to

cloud storage

Background compaction job

combines the deltas and merges it to the main

SLIDE 18

Outline

Problem overview
Solution overview

○ Kafka ○ Cassandra

Evaluation

SLIDE 19

Evaluation

Want to show that our cloud-native design can save storage cost while

keeping the performance same

Tested performance of our prototype on Kafka and Cassandra

○ Used real Cloud infrastructure - Amazon Web Services (AWS) ○ Tested different types of storage types - EBS and EFS

SLIDE 20

Evaluation

Implementations:

○ md-kafka: main-delta architecture based Kafka implementation ○ kafka: vanilla Kafka

3x storage cost savings

○ Replication factor 3x ○ Savings by design

Similar write throughput for block base

storage (EBS)

Almost 2x throughput improvement for

EFS storage, due to batching

SLIDE 21

Evaluation

Implementations:

○ md-cassandra-efs: main-delta based Cassandra using EFS storage ○ cassandra-ebs: vanilla Cassandra using EBS ○ cassandra-efs: vanilla Cassandra using EFS

Close to 2.8x storage cost saving

○ With replication factor of 3x

Almost similar throughput for all 3

types of workloads

SLIDE 22

➢

Existing on-premise applications (with replication) when deployed on cloud ends up with redundant replication

➢

We proposed a main-delta based cloud-native architecture to solve this problem

○ Allowing for storage cost savings up to factor of k (applications replication factor) ➢

We show our approach is general enough to work with the complete spectrum of replication strategies

○ Simplest strategy: single primary (Kafka case study) ○ Complex strategy: quorum based systems(Cassandra case study)

Conclusion

SLIDE 23

Thank you!

Contact for any follow-up questions: Hemant Saxena email: hemant.saxena@uwaterloo.ca