DRBD 9 Linux Storage Replication Lars Ellenberg LINBIT HA - - PowerPoint PPT Presentation

▶

Mar 25, 2023 2.95k likes •3.3k views

DRBD 9 Linux Storage Replication Lars Ellenberg LINBIT HA Solutions GmbH Vienna, Austria What this talk is about What is replication Why block level replication Why replication What do we have to deal with How we

SLIDE 1

„DRBD 9“

Linux Storage Replication Lars Ellenberg LINBIT HA Solutions GmbH Vienna, Austria

SLIDE 2

What this talk is about

What is replication
Why block level replication
Why replication
What do we have to deal with
How we are dealing with it now
Where development is headed

SLIDE 3

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

SLIDE 4

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

SLIDE 5

Standalone Servers

Node 1 Node 2 Node 3

No System Level Redundancy
Vulnerable to Failures

Important Systems

SLIDE 6

Application Level Replication

Node 1 Node 3

Special Purpose Solution
Difficult to add to an application

after the fact

Important Systems App App

SLIDE 7

Filesystem Level Replication

Node 1 Node 3

Special Filesystem
Complex
Replicate on dirty?
... on writeout?
... on close?
What about metadata?
Resilience?

Important Systems FS FS

SLIDE 8

Shared Storage/SAN

Shared Storage (SAN)

Shared data Node 1 Node 2 Node 3

No Storage Redundancy

Important Systems FC, iSCSI

SLIDE 9

Shared Storage/SAN

Replication capable SAN

Shared data Node 1 Node 2 Node 3

Application agnostic
Expensive Hardware
Expensive License costs

Important Systems FC, iSCSI Shared Storage/SAN Replica

SLIDE 10

Cluster

Block Level Replication

Node 1 Node 2

DRBD

Storage Redundancy
Application Agnostic
Generic
Flexible

SLIDE 11

Storage Cluster

SAN Replacement Storage Cluster

Node 1 Node 2

DRBD

Node 1 Node 2 Node 3 iSCSI

Storage Redundancy
Application Agnostic
Generic
Flexible

Important Systems

SLIDE 12

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

SLIDE 13

How it works: Normal operation

Data blocks

Write I/O

Data blocks

Replicate Acknowledge

Primary Node Secondary Node

Read I/O

Application

Read I/O Write I/O Replicate Acknowledge

SLIDE 14

How it works: Primary Node Failure

Write I/O

Data blocks

Replicate Acknowledge

Primary Node Secondary Node

Read I/O

Application

Read I/O

Primary Node

Write I/O Read I/O

Data blocks Application

Read I/O

SLIDE 15

Offline Node

How it works: Secondary Node Failure

Data blocks

Write I/O

Data blocks

Primary Node

Read I/O

Application

Read I/O Write I/O

SLIDE 16

How it works: Secondary Node Recovery

Data blocks Data blocks

Resync Acknowledge

Primary Node Secondary Node

Read I/O

Application

Read I/O Resync Acknowledge

SLIDE 17

What if ...

We want additional replica for desaster recovery
we can stack DRBD
The latency to the remote site is too high
stack DRBD for local redundancy,

run the high latency link in asynchronous mode, add buffering and compressing with DRBD proxy

Primary node/site fails during resync
Snapshot before becoming sync target

SLIDE 18

It Works.

Though it may be ugly.
Can we do better?

SLIDE 19

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

SLIDE 20

Generic Replication Framework

Track Data changes
Persistent (on Disk) Data Journal
“global” write ordering over multiple volumes
Fallback to bitmap based change tracking
Multi-node.
many “site links” feed from the journal
Flexible Policy
When to report completion to upper layers
(when to) do fallback to bitmap

SLIDE 21

Current „default“ reference implementation

Only talks to “dumb” block devices
“Software RAID1”

allowing some legs to lag behind

No concept of “data generation”
Cannot communicate metadata
Not directly suitable for failover solutions
Primary objective: cut down on “hardware” replication licence

costs, replicate SAN-LUNs in software to desaster recovery sites.

SLIDE 22

DRBD 9

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

SLIDE 23

Replicating smarter, asynchronous

Detect and discard overwrites
shipped batches must be atomic
Compress
Compress XOR-diff
Side effects
Can be undone
Checkpointing of generic block data
Point in time recovery

SLIDE 24

Replicating smarter, synchronous

Identify a certain Data Set Version
Start from scratch
continuous stream of changes
Data Generation Tags, dagtag
which clone (node name)
which volume (label)
who modified it last (committer)
modification date (position in the change stream)

SLIDE 25

Colorful Replication Stream

Primary Node Changes atomic batch discarding

verwrites

Data Set Divergence

SLIDE 26

Advantages of the Data Generation Tag scheme

On handshake, exchange dagtags
Trivially see who has the best data

even on primary site failure with multiple secondaries possibly lagging behind

Communicate dagtags with atomic (compressed, xor-diff)

batches

allows for daisy chaining
keep dagtag and batch payload
Checkpointing: just store the dagtag.

SLIDE 27

DRBD 9

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

SLIDE 28

Stretched cluster file systems?

Multiple branch offices
One cluster filesystem
Latency would make unusable
But when
keeping leases and
inserting lock requests into the replication data stream
while having mostly self-contained access

in the branch offices

It may feel like low latency most of the time, with occasional

longer delays on access.

Tell me why I'm wrong :-)

SLIDE 29

Comments? lars@linbit.com http://www.linbit.com http://www.drbd.org

If you think you can help,

„DRBD 9“

Linux Storage Replication Lars Ellenberg LINBIT HA Solutions GmbH Vienna, Austria

What this talk is about

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

Standalone Servers

Application Level Replication

after the fact

Filesystem Level Replication

Shared Storage (SAN)

Replication capable SAN

Block Level Replication

SAN Replacement Storage Cluster

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

How it works: Normal operation

Data blocks

Data blocks

Primary Node Secondary Node

Application

How it works: Primary Node Failure

Data blocks

Primary Node Secondary Node

Application

Primary Node

Data blocks Application

Offline Node

How it works: Secondary Node Failure

Data blocks

Data blocks

Primary Node

Application

How it works: Secondary Node Recovery

Data blocks Data blocks

Primary Node Secondary Node

Application

What if ...

run the high latency link in asynchronous mode, add buffering and compressing with DRBD proxy

It Works.

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

Generic Replication Framework

Current „default“ reference implementation

allowing some legs to lag behind

costs, replicate SAN-LUNs in software to desaster recovery sites.

DRBD 9

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

Replicating smarter, asynchronous

Replicating smarter, synchronous

Colorful Replication Stream

Advantages of the Data Generation Tag scheme

even on primary site failure with multiple secondaries possibly lagging behind

batches

DRBD 9

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

Stretched cluster file systems?

in the branch offices

longer delays on access.

Comments? lars@linbit.com http://www.linbit.com http://www.drbd.org

we are Hireing!