DRBD 9 Linux Storage Replication Lars Ellenberg LINBIT HA - - PowerPoint PPT Presentation

drbd 9
SMART_READER_LITE
LIVE PREVIEW

DRBD 9 Linux Storage Replication Lars Ellenberg LINBIT HA - - PowerPoint PPT Presentation

DRBD 9 Linux Storage Replication Lars Ellenberg LINBIT HA Solutions GmbH Vienna, Austria What this talk is about What is replication Why block level replication Why replication What do we have to deal with How we


slide-1
SLIDE 1

„DRBD 9“

Linux Storage Replication Lars Ellenberg LINBIT HA Solutions GmbH Vienna, Austria

slide-2
SLIDE 2

What this talk is about

  • What is replication
  • Why block level replication
  • Why replication
  • What do we have to deal with
  • How we are dealing with it now
  • Where development is headed
slide-3
SLIDE 3

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

slide-4
SLIDE 4

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

slide-5
SLIDE 5

Standalone Servers

Node 1 Node 2 Node 3

  • No System Level Redundancy
  • Vulnerable to Failures

Important Systems

slide-6
SLIDE 6

Application Level Replication

Node 1 Node 3

  • Special Purpose Solution
  • Difficult to add to an application

after the fact

Important Systems App App

slide-7
SLIDE 7

Filesystem Level Replication

Node 1 Node 3

  • Special Filesystem
  • Complex
  • Replicate on dirty?
  • ... on writeout?
  • ... on close?
  • What about metadata?
  • Resilience?

Important Systems FS FS

slide-8
SLIDE 8

Shared Storage/SAN

Shared Storage (SAN)

Shared data Node 1 Node 2 Node 3

  • No Storage Redundancy

Important Systems FC, iSCSI

slide-9
SLIDE 9

Shared Storage/SAN

Replication capable SAN

Shared data Node 1 Node 2 Node 3

  • Application agnostic
  • Expensive Hardware
  • Expensive License costs

Important Systems FC, iSCSI Shared Storage/SAN Replica

slide-10
SLIDE 10

Cluster

Block Level Replication

Node 1 Node 2

DRBD

  • Storage Redundancy
  • Application Agnostic
  • Generic
  • Flexible
slide-11
SLIDE 11

Storage Cluster

SAN Replacement Storage Cluster

Node 1 Node 2

DRBD

Node 1 Node 2 Node 3 iSCSI

  • Storage Redundancy
  • Application Agnostic
  • Generic
  • Flexible

Important Systems

slide-12
SLIDE 12

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

slide-13
SLIDE 13

How it works: Normal operation

Data blocks

Write I/O

Data blocks

Replicate Acknowledge

Primary Node Secondary Node

Read I/O

Application

Read I/O Write I/O Replicate Acknowledge

slide-14
SLIDE 14

How it works: Primary Node Failure

Write I/O

Data blocks

Replicate Acknowledge

Primary Node Secondary Node

Read I/O

Application

Read I/O

Primary Node

Write I/O Read I/O

Data blocks Application

Read I/O

slide-15
SLIDE 15

Offline Node

How it works: Secondary Node Failure

Data blocks

Write I/O

Data blocks

Primary Node

Read I/O

Application

Read I/O Write I/O

slide-16
SLIDE 16

How it works: Secondary Node Recovery

Data blocks Data blocks

Resync Acknowledge

Primary Node Secondary Node

Read I/O

Application

Read I/O Resync Acknowledge

slide-17
SLIDE 17

What if ...

  • We want additional replica for desaster recovery
  • we can stack DRBD
  • The latency to the remote site is too high
  • stack DRBD for local redundancy,

run the high latency link in asynchronous mode, add buffering and compressing with DRBD proxy

  • Primary node/site fails during resync
  • Snapshot before becoming sync target
slide-18
SLIDE 18

It Works.

  • Though it may be ugly.
  • Can we do better?
slide-19
SLIDE 19

Linux Storage Replication

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

slide-20
SLIDE 20

Generic Replication Framework

  • Track Data changes
  • Persistent (on Disk) Data Journal
  • “global” write ordering over multiple volumes
  • Fallback to bitmap based change tracking
  • Multi-node.
  • many “site links” feed from the journal
  • Flexible Policy
  • When to report completion to upper layers
  • (when to) do fallback to bitmap
slide-21
SLIDE 21

Current „default“ reference implementation

  • Only talks to “dumb” block devices
  • “Software RAID1”

allowing some legs to lag behind

  • No concept of “data generation”
  • Cannot communicate metadata
  • Not directly suitable for failover solutions
  • Primary objective: cut down on “hardware” replication licence

costs, replicate SAN-LUNs in software to desaster recovery sites.

slide-22
SLIDE 22

DRBD 9

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

slide-23
SLIDE 23

Replicating smarter, asynchronous

  • Detect and discard overwrites
  • shipped batches must be atomic
  • Compress
  • Compress XOR-diff
  • Side effects
  • Can be undone
  • Checkpointing of generic block data
  • Point in time recovery
slide-24
SLIDE 24

Replicating smarter, synchronous

  • Identify a certain Data Set Version
  • Start from scratch
  • continuous stream of changes
  • Data Generation Tags, dagtag
  • which clone (node name)
  • which volume (label)
  • who modified it last (committer)
  • modification date (position in the change stream)
slide-25
SLIDE 25

Colorful Replication Stream

Primary Node Changes atomic batch discarding

  • verwrites

Data Set Divergence

slide-26
SLIDE 26

Advantages of the Data Generation Tag scheme

  • On handshake, exchange dagtags
  • Trivially see who has the best data

even on primary site failure with multiple secondaries possibly lagging behind

  • Communicate dagtags with atomic (compressed, xor-diff)

batches

  • allows for daisy chaining
  • keep dagtag and batch payload
  • Checkpointing: just store the dagtag.
slide-27
SLIDE 27

DRBD 9

Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

slide-28
SLIDE 28

Stretched cluster file systems?

  • Multiple branch offices
  • One cluster filesystem
  • Latency would make unusable
  • But when
  • keeping leases and
  • inserting lock requests into the replication data stream
  • while having mostly self-contained access

in the branch offices

  • It may feel like low latency most of the time, with occasional

longer delays on access.

  • Tell me why I'm wrong :-)
slide-29
SLIDE 29

Comments? lars@linbit.com http://www.linbit.com http://www.drbd.org

If you think you can help,

we are Hireing!