A Cloud-native Architecture for Replicated Data Services Hemant - - PowerPoint PPT Presentation

a cloud native architecture for replicated data services
SMART_READER_LITE
LIVE PREVIEW

A Cloud-native Architecture for Replicated Data Services Hemant - - PowerPoint PPT Presentation

A Cloud-native Architecture for Replicated Data Services Hemant Saxena, Jeffery Pound University of Waterloo, SAP Labs Waterloo Outline Problem overview Solution overview Kafka Cassandra Evaluation 2 Problem overview Cloud


slide-1
SLIDE 1

A Cloud-native Architecture for Replicated Data Services

Hemant Saxena, Jeffery Pound University of Waterloo, SAP Labs Waterloo

slide-2
SLIDE 2

Outline

  • Problem overview
  • Solution overview

○ Kafka ○ Cassandra

  • Evaluation

2

slide-3
SLIDE 3

Problem overview

Cloud has become de facto standard for deploying applications

However, applications designed for on-premise infrastructure find it challenging to leverage the Cloud storage efficiently, because:

○ Data replication for on-premise provides fault-tolerance (FT) and high availability (HA) ○ Whereas, Cloud storage already uses replication to provides FT and HA ○ Making application’s replication redundant resulting into additional storage cost

3

slide-4
SLIDE 4

Typical replicated application on-premise

  • - -

replica-set

client

Replicated application

4

slide-5
SLIDE 5

Typical replicated application on Cloud

replica-set

  • - -

client

  • Application-level

replication (replica-set)

  • Storage-level

replication

  • Resulting into

redundant replicas

  • Introducing

additional storage cost Replicated application Storage service

5

slide-6
SLIDE 6

Problem overview

We ask the following research question...

How can we easily allow applications designed for on-premise infrastructure to efficiently leverage the Cloud storage?

6

slide-7
SLIDE 7

Outline

  • Problem overview
  • Solution overview

○ Kafka ○ Cassandra

  • Evaluation

7

slide-8
SLIDE 8

Naȉve solution

  • Have one replica (i.e. no

application-level replication)

  • Solves the problem of

redundant replication

  • But, it is prone to node
  • failure. Hence not highly

available.

replica-set

8

slide-9
SLIDE 9

Contributions of this work

➢ We show how a well-known main-delta architecture can be used to leverage cloud storage efficiently

○ i.e. ensure no redundant replication ○ while maintaining the fault-tolerance and availability guarantees of the applications ➢

We show that incorporating main-delta architecture in existing on-premise applications is easy

○ by controlling how buffers are managed and flushed to storage ○ and it is compatible with the whole spectrum of replication strategies

9

slide-10
SLIDE 10

Quick recap of main-delta architecture

Originally designed for efficiently handling mixed read/update workloads

Two parts

○ Static, read-only, read optimized main ○ Small, write-optimized delta ○ Deltas are merged with the main at regular intervals

10

slide-11
SLIDE 11

Solution overview

  • Replicated local deltas,

maintained by application

  • But single shared main on

Cloud storage (which is fault-tolerant)

replica-set

11

M M M

slide-12
SLIDE 12
  • Replicated local deltas,

maintained by application

  • But single shared main on

Cloud storage (which is fault-tolerant)

Solution overview

replica-set

How to merge the deltas?

12

M M M

slide-13
SLIDE 13

Merging Deltas to Main

Details are in how the delta is merged to the main such that

○ No data is lost from any deltas ○ And applications have same guarantees as on-premise deployment ➢

Delta-merge strategy depends on the replication strategy

○ Single primary node means single delta to merge ○ Multiple primary nodes means multiple deltas to merge

13

slide-14
SLIDE 14

Classification of replication strategies

Request-handler replica-set

▪Write to primary, read from any: ▪Write to any, read from any (e.g. quorum): ▪Write to primary, read from primary:

Request-handler Request-handler replica-set replica-set 14

slide-15
SLIDE 15

Case-study 1: Delta merge for single primary

  • Idea: In-memory buffers as

deltas, on-disk data as main.

  • Only the primary will merge its

delta to main. Other replicas will discard their deltas when they are full.

  • In case of primary node failure,

new primary node takes the responsibility of merging deltas.

replica-set

15

M M M

slide-16
SLIDE 16

Case-study 2: Delta merge for quorum system

  • The memtable and sstables can

be easily leveraged as delta and main.

  • Deciding which node merges

the delta is tricky:

○ Each node can have different set

  • f updates

16

replica-set

M M M

slide-17
SLIDE 17

Case-study 2: Delta merge for quorum system

  • Nodes flush their deltas to

cloud storage

  • Background compaction job

combines the deltas and merges it to the main

17

slide-18
SLIDE 18

Outline

  • Problem overview
  • Solution overview

○ Kafka ○ Cassandra

  • Evaluation

18

slide-19
SLIDE 19

Evaluation

  • Want to show that our cloud-native design can save storage cost while

keeping the performance same

  • Tested performance of our prototype on Kafka and Cassandra

○ Used real Cloud infrastructure - Amazon Web Services (AWS) ○ Tested different types of storage types - EBS and EFS

19

slide-20
SLIDE 20

Evaluation

  • Implementations:

○ md-kafka: main-delta architecture based Kafka implementation ○ kafka: vanilla Kafka

  • 3x storage cost savings

○ Replication factor 3x ○ Savings by design

  • Similar write throughput for block base

storage (EBS)

  • Almost 2x throughput improvement for

EFS storage, due to batching

20

slide-21
SLIDE 21

Evaluation

  • Implementations:

○ md-cassandra-efs: main-delta based Cassandra using EFS storage ○ cassandra-ebs: vanilla Cassandra using EBS ○ cassandra-efs: vanilla Cassandra using EFS

  • Close to 2.8x storage cost saving

○ With replication factor of 3x

  • Almost similar throughput for all 3

types of workloads

21

slide-22
SLIDE 22

Existing on-premise applications (with replication) when deployed on cloud ends up with redundant replication

We proposed a main-delta based cloud-native architecture to solve this problem

○ Allowing for storage cost savings up to factor of k (applications replication factor) ➢

We show our approach is general enough to work with the complete spectrum of replication strategies

○ Simplest strategy: single primary (Kafka case study) ○ Complex strategy: quorum based systems(Cassandra case study)

22

Conclusion

slide-23
SLIDE 23

Thank you!

Contact for any follow-up questions: Hemant Saxena email: hemant.saxena@uwaterloo.ca

23