Performance Improvement of Btrfs Miao Xie - - PowerPoint PPT Presentation

performance improvement of btrfs
SMART_READER_LITE
LIVE PREVIEW

Performance Improvement of Btrfs Miao Xie - - PowerPoint PPT Presentation

Performance Improvement of Btrfs Miao Xie <miaox@cn.fujitsu.com> Li Zefan <lizf@cn.fujitsu.com> Agenda Comparison between Btrfs and Ext3/4 Issue analysis (We have investigated) Small file sequential read Large file


slide-1
SLIDE 1

Performance Improvement of Btrfs

Miao Xie <miaox@cn.fujitsu.com> Li Zefan <lizf@cn.fujitsu.com>

slide-2
SLIDE 2

2

Agenda

 Comparison between Btrfs and Ext3/4  Issue analysis (We have investigated)

 Small file sequential read  Large file random write (Direct I/O and fsync)  File creation/deletion

 Future work

slide-3
SLIDE 3

3

Comparison between Btrfs and Ext3/4

 Performance test environment

 Hardware

  • CPU : Xeon(TM) X5260 3.33G X 2 ( 4 cores )
  • Memory : 4GB
  • Disk : 20GB

 Software

  • OS : RHEL6(x86_64)
  • Kernel : 2.6.38
  • Glibc : 2.12
  • Btrfs-progs : 0.9
  • Sysbench: 0.4.12
slide-4
SLIDE 4

4

Comparison between Btrfs and Ext3/4

 73 cases in total

 72 file I/O cases, mix the following conditions:

  • Small file / Large file
  • Write / Read
  • Random / Sequential
  • Sync / Async / Direct I/O
  • Single-thread / Multi-thread
  • Different block size (1Kb, 4Kb, 32Kb) *

 File creation/deletion

  • Measure the speed of empty file creation/deletion

* Block size (bs): read or write BYTES bytes at a time.

slide-5
SLIDE 5

5

Small file random read performance

1000 2000 3000 4000 5000 6000 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O General Read

IO Speed (Unit: Kb/s)

EXT3 EXT4 BTRFS

slide-6
SLIDE 6

6

Small file random write performance

Write (fsync): write data into the file, and do fsync every 100 requests

500 1000 1500 2000 2500 3000 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O Write (fsync)

IO Speed (Unit: Kb/s)

EXT3 EXT4 BTRFS

slide-7
SLIDE 7

7

Small file sequential read performance

0.00 10.00 20.00 30.00 40.00 50.00 60.00 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O General Read

IO Speed (Unit: Mb/s)

EXT3 EXT4 BTRFS

slide-8
SLIDE 8

8

Small file sequential write performance

Write (fsync): write data into the file, and do fsync every 100 requests

1000 2000 3000 4000 5000 6000 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O Write (fsync)

IO Speed (Unit: Kb/s)

EXT3 EXT4 BTRFS

slide-9
SLIDE 9

9

Large file random read performance

0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads DirectIO General Read

IO Speed (Unit: Mb/s)

EXT3 EXT4 BTRFS

slide-10
SLIDE 10

10

Large file sequential read performance

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads DirectIO General Read

IO Speed (Unit: Mb/s)

EXT3 EXT4 BTRFS

slide-11
SLIDE 11

11

Large file random write performance

(1/2)

0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads DirectIO Write (fsync)

IO Speed (Unit: Mb/s)

EXT3 EXT4 BTRFS

slide-12
SLIDE 12

12

Large file random write performance

(2/2)

0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads General Write

IO Speed (Unit: Mb/s)

EXT3 EXT4 BTRFS

slide-13
SLIDE 13

13

Large file sequential write performance

(1/2)

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads DirectIO

IO Speed (Unit: Mb/s)

EXT3 EXT4 BTRFS

slide-14
SLIDE 14

14

Large file sequential write performance

(2/2)

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads Write (fsync) General Write

IO Speed (Unit: Mb/s)

EXT3 EXT4 BTRFS

slide-15
SLIDE 15

15

File creation/deletion performance

 Create/delete lots of empty files to measure the speed of file creation and deletion.

20000 40000 60000 80000 100000 120000 140000 Creation Deletion

(Unit: files/sec) Ext3 Ext4 Btrfs

slide-16
SLIDE 16

16

Comparison between Btrfs and Ext3/4

 The performance of Btrfs is quite poor in the following cases (> 20% lower than Ext3/4)

 Small file random read (Not inline file)  Small file sequential read  Small file random/sequential write  Large file random write (Direct I/O and fsync)  Large file random write (general write, bs = 4Kb)  File creation and deletion

slide-17
SLIDE 17

17

Agenda

 Comparison between Btrfs and Ext3/4  Issue analysis (We have investigated)

 Small file sequential read  Large file random write (Direct I/O and fsync)  File creation/deletion

 Future work

slide-18
SLIDE 18

18

Small file sequential read

 Reasons

 Metadata fragment -> The file extent reading latency -> The delay of file data reading

  • Btrfs must read file extent before reading file data (no matter the small file is inlined
  • r not), but the disk has to reposition the reading offset frequently because of the

fragment, and the readahead function can’t work well. So …

Fs/file tree Disk

slide-19
SLIDE 19

19

Small file sequential read

 Reason verification

 Do small file sequential read after defragment

0.00 5.00 10.00 15.00 20.00 25.00 30.00 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O General Read

IO Speed (Unit: Mb/s)

No Defrag After Defrag

slide-20
SLIDE 20

20

Small file sequential read

 Solution

 Pre-allocation for b+ tree: Introduce free space clusters for each node in the tree, then we can allocate contiguous free space from the parent node’s cluster to store the sibling leaves closely

(The patch of this solution is still under test, hasn’t be posted)

Fs/file tree Disk

Cluster Cluster Cluster

slide-21
SLIDE 21

21

Small file sequential read

 Improvement result

0.00 10.00 20.00 30.00 40.00 50.00 60.00 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O General Read

IO Speed (Unit: Mb/s

EXT3 EXT4 BTRFS BTRFS + Patch

 Further Improvement

 Introduce the auto defragment for metadata  Apply the new metadata readahead API written by Arne

slide-22
SLIDE 22

22

Agenda

 Comparison between Btrfs and Ext3/4  Issue analysis (We have investigated)

 Small file sequential read  Large file random write (Direct I/O and fsync)  File creation/deletion

 Future work

slide-23
SLIDE 23

23

Large file random write (Direct IO and fsync)

 Background – What is tree logging? Tree logging is a special write ahead log of dirty

metadata.

 Purpose: Reduce the write requests of the metadata when fsyncs and O_SYNCs happen.  Implementation: Copy the changed items into a special tree (log tree, one per fs/file tree), and then write that tree to disk. After a crash, Btrfs recover the fs/file tree by that tree.

slide-24
SLIDE 24

24

Large file random write (Direct IO and fsync)

 Reasons

 Log lots of unchanged metadata (Ex. Csum, File extent)

File

Application Extent 1 Extent 2 Extent 3

Csum tree

… Extent N Change the relative Checksums Checksum of the file’s extent Log all the csum data of this file The extent that be changed Checksum of the file’s extent that be changed Write to disk

Disk Log tree

Extent1 Csum Extent2 Csum Extent3 Csum ExtentN Csum

slide-25
SLIDE 25

25

Large file random write (Direct IO and fsync)

 Reason verification

 Do large file random write test after closing tree log function (mount with -o notreelog)

0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads DirectIO Write (fsync)

IO Speed (Unit: Mb/s)

BTRFS BTRFS(no treelog)

slide-26
SLIDE 26

26

File

Application Extent 1 Extent 2 Extent 3

Csum tree

… Extent N Change the relative Checksums Checksum of the file’s extent Log all the csum data of this file The extent that be changed Checksum of the file’s extent that be changed Write to disk

Disk Log tree

Extent1 Csum Extent2 Csum Extent3 Csum ExtentN Csum

Large file random write (Direct IO and fsync)

 Solution

 Don’t log unchanged metadata: Introduce sub- transaction id to filter the unchanged metadata (v2.6.41)

slide-27
SLIDE 27

27

File

Application Extent 1 Extent 2 Extent 3

Csum tree

… Extent N Change the relative Checksums Checksum of the file’s extent Log all the csum data of this file The extent that be changed Checksum of the file’s extent that be changed Write to disk

Disk Log tree

Extent1 Csum Extent2 Csum Extent3 Csum ExtentN Csum

Large file random write (Direct IO and fsync)

 Solution

 Don’t log unchanged metadata: Introduce sub- transaction id to filter the unchanged metadata (v2.6.41)

slide-28
SLIDE 28

28

File

Application Extent 1 Extent 2 Extent 3

Csum tree

… Extent N Change the relative Checksums Checksum of the file’s extent Log the changed csum data of this file The extent that be changed Checksum of the file’s extent that be changed Write to disk

Disk Log tree

Extent1 Csum Extent2 Csum Extent3 Csum ExtentN Csum

Large file random write (Direct IO and fsync)

 Solution

 Don’t log unchanged metadata: Introduce sub- transaction id to filter the unchanged metadata (v2.6.41)

slide-29
SLIDE 29

29

Large file random write (Direct IO and fsync)

 Improvement result

0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads DirectIO Write (fsync)

IO Speed (Unit: Mb/s)

EXT3 EXT4 BTRFS BTRFS + Patch

slide-30
SLIDE 30

30

Agenda

 Comparison between Btrfs and Ext3/4  Issue analysis (We have investigated)

 Small file sequential read  Large file random write (Direct I/O and fsync)  File creation/deletion

 Future work

slide-31
SLIDE 31

31

File creation/deletion

 Reasons

 Btrfs does more metadata insertion and deletion.  Btrfs must search the b+ tree to look up the place, where the inode will be stored, when updating inode. (Time complexity: O(log(n)), But Ext3/4 is O(1))  Searching nodes/leaves in the rb-tree spends lots of time

Btrfs Ext4(Not Sure) File Creation

inode name back reference ACL directory item directory name index inode ACL directory entry

File Deletion

inode inode back reference ACL directory item directory name index logged directory item logged directory name index inode ACL directory entry

slide-32
SLIDE 32

32

File creation/deletion

 Solution

 Batch operation -- Insert/delete a batch of the directory name indexes (v2.6.40)  Delay operation -- Delay to update the inode information in the b+ tree (v2.6.40)  Using radix tree instead of rb-tree (v2.6.37)

slide-33
SLIDE 33

33

File creation/deletion

 Improvement result

 Create/delete lots of empty files to measure the speed of file creation and deletion.

20000 40000 60000 80000 100000 120000 140000 Creation Deletion (Unit: files/sec)

Ext3 Ext4 Btrfs Btrfs + Patch

slide-34
SLIDE 34

34

Agenda

 Comparison between Btrfs and Ext3/4  Issue analysis (We have investigated)

 Small file sequential read  Large file random write (Direct I/O and fsync)  File creation/deletion

 Future work

slide-35
SLIDE 35

35

Future work

 Improve small file sequential read performance further  Improve small file random read performance (Not inline file)  Improve small file random/sequential write performance  Do other benchmarks and improve bad cases

slide-36
SLIDE 36

36

Q/A