BEYOND THE CLUSTER: WAN DATA REPLICATION WITH GRIDGAIN YAKOV - - PowerPoint PPT Presentation

beyond the cluster
SMART_READER_LITE
LIVE PREVIEW

BEYOND THE CLUSTER: WAN DATA REPLICATION WITH GRIDGAIN YAKOV - - PowerPoint PPT Presentation

BEYOND THE CLUSTER: WAN DATA REPLICATION WITH GRIDGAIN YAKOV ZHDANOV WHO? Yakov Zhdanov: - GridGains Product Development VP - With GridGain since 2010 - Apache Ignite committer and PMC - Passion for performance & scalability -


slide-1
SLIDE 1

BEYOND THE CLUSTER:

WAN DATA REPLICATION WITH GRIDGAIN

YAKOV ZHDANOV

slide-2
SLIDE 2

WHO?

Yakov Zhdanov:

  • GridGain’s Product Development VP
  • With GridGain since 2010
  • Apache Ignite committer and PMC
  • Passion for performance & scalability
  • Finding ways to make product better
  • St. Petersburg, Russia
slide-3
SLIDE 3

PLAN

1)Why replicate?

slide-4
SLIDE 4

PLAN

1)Why replicate? 2)How do DBs solve this?

slide-5
SLIDE 5

PLAN

1)Why replicate? 2)How do DBs solve this? 3)Replication: Monolith vs Distributed 4)

slide-6
SLIDE 6

PLAN

1)Why replicate? 2)How do DBs solve this? 3)Replication: Monolith vs Distributed 4)GridGain DR overview – roles, features, process

slide-7
SLIDE 7

PLAN

1)Why replicate? 2)How do DBs solve this? 3)Replication: Monolith vs Distributed 4)GridGain DR overview – roles, features, process 5)Future plans – Sync/Async TX replication

slide-8
SLIDE 8

WHY REPLICATE YOUR DATA?

slide-9
SLIDE 9

WHY REPLICATE?

 Data security

slide-10
SLIDE 10

WHY REPLICATE?

 Data security  Failover

slide-11
SLIDE 11

WHY REPLICATE?

 Data security  Failover  Data warehousing

slide-12
SLIDE 12

WHY REPLICATE?

 Data security  Failover  Data warehousing  Load balancing

slide-13
SLIDE 13

WHY REPLICATE?

 Data security  Failover  Data warehousing  Load balancing  Increasing system capacity

slide-14
SLIDE 14

POSTGRESQL REPLICATION

 PostgreSQL is an object-relational database

management system (ORDBMS)

 Pioneered many things and concepts  High maturity level  Opensource and widely used

slide-15
SLIDE 15

POSTGRESQL REPLICATION

 Shared disk storage

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

slide-16
SLIDE 16

POSTGRESQL REPLICATION

 Shared disk storage  File system replication

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

slide-17
SLIDE 17

POSTGRESQL REPLICATION

 Shared disk storage  File system replication  Write-Ahead Log Shipping

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

slide-18
SLIDE 18

POSTGRESQL REPLICATION

 Shared disk storage  File system replication  Write-Ahead Log Shipping  Logical Replication

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

slide-19
SLIDE 19

POSTGRESQL REPLICATION

 Shared disk storage  File system replication  Write-Ahead Log Shipping  Logical Replication  T

rigger-Based Master-Standby Replication

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

slide-20
SLIDE 20

POSTGRESQL REPLICATION

 Shared disk storage  File system replication  Write-Ahead Log Shipping  Logical Replication  T

rigger-Based Master-Standby Replication

 Statement-Based Replication Middleware

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

slide-21
SLIDE 21

POSTGRESQL REPLICATION

 Shared disk storage  File system replication  Write-Ahead Log Shipping  Logical Replication  T

rigger-Based Master-Standby Replication

 Statement-Based Replication Middleware  Async Multimaster Replication

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

slide-22
SLIDE 22

POSTGRESQL REPLICATION

 Shared disk storage  File system replication  Write-Ahead Log Shipping  Logical Replication  T

rigger-Based Master-Standby Replication

 Statement-Based Replication Middleware  Async Multimaster Replication  Sync Multimaster Replication

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

slide-23
SLIDE 23

REPLICATION IN DISTRIBUTED SYSTEMS

Monolith Distributed Data security

+ ?

Failover

+ +

Load balancing

+ ?/+

Increasing system capacity

+ ?

Data warehousing

+ ?

slide-24
SLIDE 24

REPLICATION IN GRIDGAIN

 Introduced in 2012/2013  Completely new feature  Required a lot of engineering

efgorts

 Required revisiting of existing

logic

 Async KEY/VALUE mode available  Sync/Async TX replication under

development

https://docs.gridgain.com/docs/data-center- replication

slide-25
SLIDE 25

REPLICATION IN GRIDGAIN: ROLES

  • Sender cache
  • Sender hub
  • Receiver hub
  • Receiver cache

https://docs.gridgain.com/docs/data-center- replication

slide-26
SLIDE 26

REPLICATION IN GRIDGAIN: FEATURES

  • Complex topologies (up to 32 datacenters)
  • Failover
  • Pluggable conflict resolution
  • Filtering
  • Pause/Resume
  • Full state transfer

https://docs.gridgain.com/docs/data-center- replication

slide-27
SLIDE 27

REPLICATION IN GRIDGAIN: HOW IT WORKS

slide-28
SLIDE 28

REPLICATION IN GRIDGAIN: COMPLEX TOPOLOGIES

slide-29
SLIDE 29

REPLICATION IN GRIDGAIN: WHAT CAN GO WRONG?

Node failure in sending topology

slide-30
SLIDE 30

REPLICATION IN GRIDGAIN: WHAT CAN GO WRONG?

Node failure in sending topology

slide-31
SLIDE 31

REPLICATION IN GRIDGAIN: WHAT CAN GO WRONG?

Sender hub(s) failure

slide-32
SLIDE 32

REPLICATION IN GRIDGAIN: WHAT CAN GO WRONG?

Sender hub(s) failure

slide-33
SLIDE 33

REPLICATION IN GRIDGAIN: WHAT CAN GO WRONG?

Receiver hub(s) failure

slide-34
SLIDE 34

REPLICATION IN GRIDGAIN: WHAT CAN GO WRONG?

Receiver hub(s) failure

slide-35
SLIDE 35

REPLICATION IN GRIDGAIN: WHAT CAN GO WRONG?

Node failure in receiving topology

slide-36
SLIDE 36

REPLICATION IN GRIDGAIN: WHAT CAN GO WRONG?

Node failure in receiving topology

slide-37
SLIDE 37

REPLICATION IN GRIDGAIN: WHAT CAN BE BETTER?

Batching on per-node basis vs per-partition basis

Per-node batching

 More efficient from memory standpoint  Batches collected quickly

Per-partition batching

 No need for additional processing on receiving side  Less contention – honors thread-per-partition model  Probably, higher GC pressure  But still expected to perform better

slide-38
SLIDE 38

REPLICATION IN GRIDGAIN: FUTURE PLANS

TX Replication

slide-39
SLIDE 39

REPLICATION IN GRIDGAIN: FUTURE PLANS

TX Replication – over stretched cluster

slide-40
SLIDE 40

REPLICATION IN GRIDGAIN: FUTURE PLANS

TX replication mechanism

slide-41
SLIDE 41

REPLICATION IN GRIDGAIN: FUTURE PLANS

TX Replication – sync modes

 Strict SYNC mode

Main primary node (PN) → Sender → Receiver → Stand-In PN →Receiver →Sender → Main PN

 Merciful SYNC mode

Main PN → Sender → Receiver → Receiver's WAL → Sender → Main PN

 ASYNC mode

Main PN->Sender->Sender WAL->Main PN

slide-42
SLIDE 42

LESSONS LEARNED

 Is replication able to solve your problem?

slide-43
SLIDE 43

LESSONS LEARNED

 Is replication able to solve your problem?  Pickup proper settings: sync/async, physical vs logical

changes.

slide-44
SLIDE 44

LESSONS LEARNED

 Is replication able to solve your problem?  Pickup proper settings: sync/async, physical vs logical

changes.

 Be aware of internals – know what makes it work.

slide-45
SLIDE 45

LESSONS LEARNED

 Is replication able to solve your problem?  Pickup proper settings: sync/async, physical vs logical

changes.

 Be aware of internals – know what makes it work.  Make sure to test, tune and monitor.

slide-46
SLIDE 46

CONTACTS

yzhdanov@gridgain.com http://ignite.apache.org dev@ignite.apache.org user@ignite.apache.org

slide-47
SLIDE 47

QUESTIONS?

ANY QUESTIONS?