Replication Distilled: Hazelcast Deep Dive Ensar Basri Kahveci - - PowerPoint PPT Presentation

replication distilled hazelcast deep dive
SMART_READER_LITE
LIVE PREVIEW

Replication Distilled: Hazelcast Deep Dive Ensar Basri Kahveci - - PowerPoint PPT Presentation

Replication Distilled: Hazelcast Deep Dive Ensar Basri Kahveci Hazelcast Hazelcast The leading open source Java IMDG Distributed Java collections, concurrency primitives, ... Distributed computations, messaging, ... In-Memory


slide-1
SLIDE 1

Replication Distilled: Hazelcast Deep Dive

Ensar Basri Kahveci Hazelcast

slide-2
SLIDE 2

Hazelcast

▪ The leading open source Java IMDG ▪ Distributed Java collections, concurrency primitives, ... ▪ Distributed computations, messaging, ...

slide-3
SLIDE 3

In-Memory Data Grids

▪ Distributed caching ▪ Keeping data in local JVM for fast access & processing ▪ Elasticity, availability, high throughput, and low latency ▪ Multiple copies of data to tolerate failures

slide-4
SLIDE 4

Replication

▪ Putting a data set into multiple nodes ▪ Fault tolerance ▪ Latency ▪ Throughput

slide-5
SLIDE 5

Challenges

▪ Where to perform reads & writes? ▪ How to keep replicas sync? ▪ How to handle concurrent reads & writes? ▪ How to handle failures?

slide-6
SLIDE 6

CAP Principle

▪ Pick two of C, A, and P ▪ CP versus AP

slide-7
SLIDE 7

CP

slide-8
SLIDE 8

AP

slide-9
SLIDE 9

Consistency/Latency Trade-off

slide-10
SLIDE 10

Consistency/Latency Trade-off

slide-11
SLIDE 11

PACELC Principle

▪ If there is a network partition (P), we have to choose between availability and consistency (AC). ▪ Else (E), during normal operation, we can choose between latency and consistency (LC).

slide-12
SLIDE 12

Let’s build the core replication protocol

  • f Hazelcast
slide-13
SLIDE 13

Primary Copy

▪ Operations are sent to primary replicas. ▪ Strong consistency when the primary is reachable.

slide-14
SLIDE 14

Partitioning (Sharding)

▪ Partitioning helps to scale primaries. ▪ A primary replica is elected for each partition.

slide-15
SLIDE 15

Updating Replicas

slide-16
SLIDE 16

Updating Replicas

partition id = hash(serialize(key)) % partition count

slide-17
SLIDE 17

Updating Replicas

slide-18
SLIDE 18

Updating Replicas

slide-19
SLIDE 19

Async Replication

▪ Each replica is updated separately. ▪ High throughput and availability

slide-20
SLIDE 20

Anti-Entropy

▪ Backup replicas can fall behind the primary. ▪ Non-sync backups are fixed with an active anti-entropy mechanism.

slide-21
SLIDE 21

Replicas are not sync

▪ The client reads a key from the current primary replica.

slide-22
SLIDE 22

Network Partitioning

▪ The client reads the same key.

slide-23
SLIDE 23

Split-Brain

▪ Strong consistency is lost.

slide-24
SLIDE 24

Resolving the Divergence

▪ Merge policies: higher hits, latest update / access, … ▪ Merging may cause lost updates.

slide-25
SLIDE 25

Let’s classify this protocol with PACELC

slide-26
SLIDE 26

Hazelcast is PA/EC

▪ Consistency is usually traded to availability and latency together. ▪ Hazelcast works in memory and mostly used in a single computing cluster. ▪ Consistency - latency trade-off is minimal. ▪ PA/EC works fine for distributed caching.

slide-27
SLIDE 27

Favoring Latency (PA/EL)

slide-28
SLIDE 28

Scaling Reads

▪ Reads can be served locally from near caches and backup replicas.

slide-29
SLIDE 29

Favoring Consistency (PC/EC)

slide-30
SLIDE 30

Failure Detectors

▪ Local failure detectors rely on timeouts. ▪ Operations are blocked after the cluster size falls below a threshold.

slide-31
SLIDE 31

Failure Detectors

▪ It takes some time to detect an unresponsive node. ▪ Minimizes divergence and maintains the baseline consistency.

slide-32
SLIDE 32

Isolated Failure Detectors

▪ Configure failure detectors independently for data structures ▪ Phi-Accrual Failure Detector

slide-33
SLIDE 33

CP Data Structures

▪ IDGenerator ▪ Distributed impls of java.util.concurrent.* ▪ PA/EC is not the perfect fit for CP data structures.

slide-34
SLIDE 34

Flake IDs

▪ Local unique id generation ▪ Nodes get a unique node id during join. ▪ K-ordered IDs

slide-35
SLIDE 35

CRDTs

▪ CRDTs: Conflict-free Replicated Data Types ▪ Replicas are updated concurrently without coordination. ▪ Strong eventual consistency ▪ Counters, sets, maps, graphs, ...

slide-36
SLIDE 36

PN-Counter

slide-37
SLIDE 37

PN-Counter

slide-38
SLIDE 38

Sync Replication

▪ Concurrency primitives imply the true CP behavior. ▪ Paxos, Raft, ZAB, VR ▪ Re-implementing Hazelcast concurrency primitives with Raft

slide-39
SLIDE 39

Recap

▪ http://bit.ly/hazelcast-replication-consistency ▪ http://bit.ly/hazelcast-network-partitions ▪ http://dbmsmusings.blogspot.com/2017/10/hazelcast-an d-mythical-paec-system.html

slide-40
SLIDE 40

Thanks!

You can find me at ▪ @metanet ▪ ebkahveci@gmail.com