WalB: A Fast and Low Latency Backup System for Block Devices Open - - PowerPoint PPT Presentation

walb a fast and low latency backup system for block
SMART_READER_LITE
LIVE PREVIEW

WalB: A Fast and Low Latency Backup System for Block Devices Open - - PowerPoint PPT Presentation

WalB: A Fast and Low Latency Backup System for Block Devices Open Source Summit Japan 2017 Kota Uchida June 1, 2017 1 About me Kota Uchida SRE team at Cybozu, Inc. A WalB developer 2 About Cybozu A large cloud service vendor


slide-1
SLIDE 1

WalB: A Fast and Low Latency Backup System for Block Devices

Open Source Summit Japan 2017 Kota Uchida June 1, 2017

1

slide-2
SLIDE 2

2

About me

▌Kota Uchida ▌SRE team at Cybozu, Inc. ▌A WalB developer

slide-3
SLIDE 3

3

About Cybozu

▌A large cloud service vendor in Japan. ▌Largest market shares in field of collaborative software. ▌We serve web applications on our own cloud platform.  kintone: a low-code business app platform  and more

slide-4
SLIDE 4

#customer companies: #accesses / day: write IOs / day:

19,000+ 19,000+ 190 millions 190 millions 24.5 24.5 TiB TiB

4

slide-5
SLIDE 5

5

Service Level Objective

▌24/7 nonstop service ▌99.99% availability (4 min / month) ▌Daily backup (retention period is 14 days) ▌Disaster recover: copy data to a remote site once a day

slide-6
SLIDE 6

Architecture of our platform

6

Application Server L7LB Storage Server dm-snap Storage Server dm-snap Backup Server Remote Site Database Server Diff Diff Diff Diff

The scope of this talk

RAID 1 Blob Server

slide-7
SLIDE 7

Mapping Info

Snapshot Management with dm-snap

7

A B

Original Volume Area Snapshot Area

Logical Structure Physical Structure (1) CoW

Latest Image

Write A’ Write B’

Snapshot Image

(2) Write B’ B B’ A A’ A’ 1 2 3 4

slide-8
SLIDE 8

Backup using dm-snap

8 Snapshot1

(2) Full-scan a new snapshot Logical Structure

Snapshot0

B’ A’ (3) Generate a diff image by comparing two snapshots B (1) Full-scan an old snapshot B’ A’ A

slide-9
SLIDE 9

Full-scan at night

9

Daytime Backup processing time

  • ’clock
slide-10
SLIDE 10

UX degradation during a full-scan

10

Full-scanning

slide-11
SLIDE 11

11

We have no more “nights”

▌Until now: Full scan is allowed only when access rate is low, i.e., at night. ▌From now on: We have to handle accesses from multiple timezones. ▌We must be able to backup any time without UX degradation.

slide-12
SLIDE 12

12

New Solution

▌We need a new solution with:  No IO spikes  Short backup time ▌We compared dm-thin with WalB

slide-13
SLIDE 13

13

What is dm-thin?

▌dm-thin provides thin-provisioning volume management to  share same data among volumes  reduce disk usage using snapshots ▌In the mainline Linux kernel

slide-14
SLIDE 14

Snapshot Management with dm-thin

Logical Structure Physical Structure A Latest Tree Latest Image A

slide-15
SLIDE 15

Snapshot Management with dm-thin

15

Logical Structure Physical Structure A Snapshot Tree Latest Tree A Snapshot Latest Image A

slide-16
SLIDE 16

Snapshot Management with dm-thin

16

A A’ Snapshot Tree Latest Tree (1) CoW (1) CoW Write A’ Physical Structure (2) Write (2) Update A’ A Snapshot Latest Image Logical Structure

slide-17
SLIDE 17

17

A B B’ Snapshot0 Snapshot1 A’ A’ B’ A B Snapshot0 Snapshot1 Generate a diff image using dm-thin metadata Logical Structure Physical Structure

Backup using dm-thin

slide-18
SLIDE 18

18

What is WalB?

▌A real-time and incremental backup system  developed at Cybozu Labs ▌Can backup block devices without IO spikes

dm-snap full scanning WalB no spikes

slide-19
SLIDE 19

Special Block Devices for WalB

19

WalB device Data device Log device

Read Write

Any application (File system, DBMS, etc.) Linear mapped Ring buffer

slide-20
SLIDE 20

Write IO Logging and Backup with WalB

20

A B Data Device Log Device 1 2 3 4 Time series of write I/Os Time

slide-21
SLIDE 21

Write IO Logging and Backup with WalB

21

B A B Write A’ Data Device Log Device A’ 1 2 3 4 1 A’ Time series of write I/Os Time Scan the log device and generate a diff image

slide-22
SLIDE 22

Write IO Logging and Backup with WalB

22

B A B B’ Write A’ Write B’ Data Device Log Device A’ A’ 4 1 1 2 3 4 A’ A’ B’ Time series of write I/Os Scan the log device and generate a diff image Time 1

slide-23
SLIDE 23

23

Performance test

▌Compared dm-snap, dm-thin, and WalB ▌Executed a workload during a backup  The workload & the backup will affect each other ▌Measured the following metrics:  Latencies of the workload  Backup time

slide-24
SLIDE 24

24

Environment & Settings

▌Test environment:  CPU:2.40 GHz x 12 cores  MEM:192 GiB  HDD:4 TB HDD, RAID 6 (8D2P)  NIC:10 Gbps x 2  Kernel:4.11 (latest upstream) ▌Test settings:  100 GiB volumes  Workload: 4 KiB Random writes for a 5 GiB range

slide-25
SLIDE 25

25

Measuring the Backup Time (dm-snap, dm-thin)

▌dm-snap:take a snapshot & scan full image ▌dm-thin:get a structure of snapshot trees & find modified blocks & read these blocks

5 GiB 95 GiB (unchanged) 4 KiB Random Writes dm-snap : scan full image dm-thin : scan changed chunks (tree traversal)

slide-26
SLIDE 26

26

Measuring the Backup Time (WalB)

▌WalB:scan logs from a log device & send them to a backup server continuously

5 GiB 95 GiB (unchanged) 4 KiB Random Writes WalB : scan logs Log Device Write IO logs WalB Device Backup Server Diff Diff Network

slide-27
SLIDE 27

Write I/O latency

dm-thin dm-snap WalB no-backup 27

IO spikes due to CoW, worse than dm-snap! Small overhead large due to CoW

slide-28
SLIDE 28

Backup time

28

1146 2260 1.2

slower than dm-snap so fast!

slide-29
SLIDE 29

29

Conclusion

▌dm-snap & dm-thin  High I/O latency during a backup  Long backup time ▌WalB  Stable and low I/O latency (no spikes)  Short backup time WalB satisfies our requirements for production use.

slide-30
SLIDE 30

30

Try WalB!

▌Project page  https://walb-linux.github.io/ ▌Tutorial  https://github.com/walb-linux/walb- tools/tree/master/misc/vagrant/  Vagrantfile for Ubuntu 16.04 and CentOS 7

slide-31
SLIDE 31

Q&A

email: kota-uchida@cybozu.co.jp twitter: @uchan_nos

31