File Systems Fated for Senescence? Nonsense, Says Science! Alex - - PowerPoint PPT Presentation

file systems fated for senescence nonsense says science
SMART_READER_LITE
LIVE PREVIEW

File Systems Fated for Senescence? Nonsense, Says Science! Alex - - PowerPoint PPT Presentation

File Systems Fated for Senescence? Nonsense, Says Science! Alex Conway , Ainesh Bakshi , Yizheng Jiao , Yang Zhan , Michael A. Bender , William Jannen , Rob Johnson , Bradley C. Kuszmaul , Donald E. Porter ,


slide-1
SLIDE 1

File Systems Fated for Senescence? Nonsense, Says Science!

Alex Conway🃠 , Ainesh Bakshi🃠, Yizheng Jiao♢, Yang Zhan♢, Michael

  • A. Bender♠, William Jannen♠, Rob Johnson♠, Bradley C. Kuszmaul♡,

Donald E. Porter♢, Jun Yuan♣ and Martin Farach-Colton🃠

🃠Rutgers University, ♢The University of North Carolina at Chapel Hill, ♠Stony Brook University, ♡Oracle Corporation and Massachusetts Institute of Technology, ♣Farmingdale State College of SUNY

slide-2
SLIDE 2

File Systems Fated for Senescence? Nonsense, Says Science; The Essence of Semperjuvenescense is Coalescence!

slide-3
SLIDE 3

File Systems Fated for Senescence? Nonsense, Says Science; The Essence of Semperjuvenescense is Coalescence!

  • ld age

being young forever merging together

slide-4
SLIDE 4

File System Aging

Aging is fragmentation over time

Performance

slide-5
SLIDE 5

In this talk

Do file systems age? What can we do about it?

slide-6
SLIDE 6

Is aging a problem?

slide-7
SLIDE 7

Is aging a problem?

slide-8
SLIDE 8

Chris Hoffman at howtogeek.com says:

“Linux’s ext2, ext3, and ext4 file systems… [are] designed to avoid fragmentation in normal use.” “If you do have problems with fragmentation on Linux, you probably need a larger hard disk.”

Is aging a problem?

slide-9
SLIDE 9

Chris Hoffman at howtogeek.com says:

“Linux’s ext2, ext3, and ext4 file systems… [are] designed to avoid fragmentation in normal use.” “If you do have problems with fragmentation on Linux, you probably need a larger hard disk.” “Modern Linux filesystems keep fragmentation at a minimum…Therefore it is not necessary to worry about fragmentation in a Linux system.”

Is aging a problem?

slide-10
SLIDE 10

Chris Hoffman at howtogeek.com says:

“Linux’s ext2, ext3, and ext4 file systems… [are] designed to avoid fragmentation in normal use.” “If you do have problems with fragmentation on Linux, you probably need a larger hard disk.” “Modern Linux filesystems keep fragmentation at a minimum…Therefore it is not necessary to worry about fragmentation in a Linux system.”

Nope

Is aging a problem?

slide-11
SLIDE 11

Is aging a problem?

slide-12
SLIDE 12

Is aging a problem?

Aging happens in real filesystems

  • Smith and Seltzer (’97)

Benchmarks should incorporate aging

  • Zhu, Chen and Chiueh (’05)
  • Agrawal, A. Arpaci-Dusseau and R. Arpaci-Dusseau (’09)

Yep

slide-13
SLIDE 13

Is aging a problem?

Yep Nope

slide-14
SLIDE 14

Let’s do some science!

slide-15
SLIDE 15

Inducing Aging

Developer workload Server workload Synthetic workloads

We use three different workloads

slide-16
SLIDE 16

Developer workload Server workload Synthetic workloads

We use three different workloads

See the paper

Inducing Aging

slide-17
SLIDE 17

Simulating a Developer

slide-18
SLIDE 18

Simulating a Developer

get coffee

slide-19
SLIDE 19

Simulating a Developer

get coffee git pull git pull

slide-20
SLIDE 20

Simulating a Developer

get coffee git pull git pull make make

slide-21
SLIDE 21

Simulating a Developer

get coffee git pull git pull make make get coffee

slide-22
SLIDE 22

Simulating a Developer

get coffee git pull git pull make make get coffee git pull

slide-23
SLIDE 23

Simulating a Developer

get coffee git pull git pull make get coffee git pull add awesome features

slide-24
SLIDE 24

Simulating a Developer

get coffee git pull git pull make get coffee git pull add awesome features get coffee

slide-25
SLIDE 25

Simulating a Developer

get coffee git pull git pull make get coffee git pull add awesome features get coffee git pull

slide-26
SLIDE 26

Simulating a Developer

get coffee git pull git pull make get coffee git pull add awesome features get coffee git pull fix bugs

slide-27
SLIDE 27

Simulating a Developer

get coffee git pull git pull make get coffee git pull add awesome features get coffee git pull fix bugs ...

slide-28
SLIDE 28

Simulating a Developer

get coffee git pull git pull make get coffee git pull add awesome features get coffee git pull fix bugs ... We can simulate a developer by replaying Git histories

slide-29
SLIDE 29

Simulating a Developer

slide-30
SLIDE 30

Simulating a Developer

Use the Linux kernel repo from github.com

Do 100 git pulls Measure Performance

slide-31
SLIDE 31

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4

slide-32
SLIDE 32

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4

slide-33
SLIDE 33

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4

slide-34
SLIDE 34

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4

slide-35
SLIDE 35

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4

slide-36
SLIDE 36

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4 Intrafile Fragmentation

slide-37
SLIDE 37

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4 Intrafile Fragmentation

slide-38
SLIDE 38

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4 Intrafile Fragmentation

slide-39
SLIDE 39

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4 Interfile Fragmentation Intrafile Fragmentation

slide-40
SLIDE 40

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4 Interfile Fragmentation Intrafile Fragmentation

slide-41
SLIDE 41

Measuring Aging

time grep -r random_string /path/to/filesystem

dir file1 file2 file3 file4 Interfile Fragmentation Intrafile Fragmentation

Then normalize per gigabyte read

slide-42
SLIDE 42

Do modern file systems age?

slide-43
SLIDE 43

Time in seconds / GiB

200 400 600 800

Git pulls performed

1 2 3 4 5 6 7 8 9 1

14.3x

Lower is better

Our Setup: Cold Cache, 3.4 GHz Quad Core, 4GiB RAM, 20 GiB HDD partition - SATA 7200 RPM

Git Workload on ext4 on HDD

slide-44
SLIDE 44

Time in seconds / GiB

200 400 600 800

Git pulls performed

1 2 3 4 5 6 7 8 9 1

14.3x

Lower is better

Our Setup: Cold Cache, 3.4 GHz Quad Core, 4GiB RAM, 20 GiB HDD partition - SATA 7200 RPM

2x slowdown

Git Workload on ext4 on HDD

slide-45
SLIDE 45

Time in seconds / GiB

200 400 600 800

Git pulls performed

1 2 3 4 5 6 7 8 9 1

14.3x

Lower is better

Our Setup: Cold Cache, 3.4 GHz Quad Core, 4GiB RAM, 20 GiB HDD partition - SATA 7200 RPM

2x slowdown 4x slowdown

Git Workload on ext4 on HDD

slide-46
SLIDE 46

Time in seconds / GiB

200 400 600 800

Git pulls performed

1 2 3 4 5 6 7 8 9 1

14.3x

Lower is better

Our Setup: Cold Cache, 3.4 GHz Quad Core, 4GiB RAM, 20 GiB HDD partition - SATA 7200 RPM

15 minutes to grep 1.2GiB

Git Workload on ext4 on HDD

slide-47
SLIDE 47

How can we be sure this slowdown is due to aging?

slide-48
SLIDE 48

I’m not old. My directory structure is different!

How can we be sure this slowdown is due to aging?

slide-49
SLIDE 49

File System Rejuvenation

Idea: Copy same logical state to a new file system

  • After each 100 pulls
  • Compare grep cost
slide-50
SLIDE 50

Aging ext4 with Git on HDD

Time in seconds / GiB

200 400 600 800

Git pulls performed

1 2 3 4 5 6 7 8 9 1

Aged Unaged 8.8x

Lower is better

slide-51
SLIDE 51

Time in seconds / GiB

200 400 600 800

Git pulls performed

1 2 3 4 5 6 7 8 9 1

Aged Unaged 8.8x

Smaller average file size makes the unaged 60% slower

Lower is better

Aging ext4 with Git on HDD

slide-52
SLIDE 52

Is this specific to ext4?

slide-53
SLIDE 53

Btrfs

200 400 600 800

F2FS

500 1000 1500 2000

ZFS

500 1000 1500 2000

XFS

200 400 600 800

20.6x 22.4x 2.2x

weird unaged behavior on XFS

11.8x

Lower is better

Aging other file systems with Git on HDD

slide-54
SLIDE 54

Will SSDs save us?

slide-55
SLIDE 55

Git Workload on XFS on SSD

Time in seconds / GiB

10 20 30

Git pulls performed

1 2 3 4 5 6 7 8 9 1

Aged Unaged

Lower is better

1.9x

slide-56
SLIDE 56

Git Workload on SSD

Btrfs

10 20 30

ext4

10 20 30

ZFS

10 20 30 40

F2FS

10 20 30

2.2x

Lower is better

1.5x

slide-57
SLIDE 57

Btrfs

10 20 30

ext4

10 20 30

ZFS

10 20 30 40

F2FS

10 20 30

2.2x

Lower is better

1.5x

ZFS and ext4 slow down with smaller average file size

Git Workload on SSD

slide-58
SLIDE 58

Btrfs

10 20 30

ext4

10 20 30

ZFS

10 20 30 40

F2FS

10 20 30

2.2x

Lower is better

1.5x

Told ya!

ZFS and ext4 slow down with smaller average file size

Git Workload on SSD

slide-59
SLIDE 59

Aging is real

Btrfs, ext4, F2FS, XFS, ZFS all age

  • Up to 22x on HDD
  • Up to 2x on SSD

Git lets us replay a real development history

  • Induce aging by simulating years of use
  • Takes between 5 hours and 2 days
  • Download these scripts from betrfs.org
slide-60
SLIDE 60

How can we prevent aging?

slide-61
SLIDE 61

Intrafile Fragmentation: Avoid breaking large files into small fragments

Design goals to address fragmentation

slide-62
SLIDE 62

Intrafile Fragmentation: Avoid breaking large files into small fragments Interfile Fragmentation: Cluster logically related small files

Design goals to address fragmentation

slide-63
SLIDE 63

Intrafile Fragmentation: Avoid breaking large files into small fragments Interfile Fragmentation: Cluster logically related small files

Design goals to address fragmentation

What do we mean by small?

slide-64
SLIDE 64

Read Length vs Bandwidth

Bandwidth in MiB/sec

0.1 1 10 100 1000

Sequential Read Length

4 K i B 8 K i B 1 6 K i B 3 2 K i B 6 4 K i B 1 2 8 K i B 2 5 6 K i B 5 1 2 K i B 1 M i B 2 M i B 4 M i B 8 M i B 1 6 M i B 3 2 M i B 6 4 M i B 1 2 8 M i B 2 5 6 M i B

HDD Higher is better

I/O Size vs Effective Bandwidth

slide-65
SLIDE 65

Read Length vs Bandwidth

Bandwidth in MiB/sec

0.1 1 10 100 1000

Sequential Read Length

4 K i B 8 K i B 1 6 K i B 3 2 K i B 6 4 K i B 1 2 8 K i B 2 5 6 K i B 5 1 2 K i B 1 M i B 2 M i B 4 M i B 8 M i B 1 6 M i B 3 2 M i B 6 4 M i B 1 2 8 M i B 2 5 6 M i B

SSD HDD Higher is better

I/O Size vs Effective Bandwidth

slide-66
SLIDE 66

Read Length vs Bandwidth

Bandwidth in MiB/sec

0.1 1 10 100 1000

Sequential Read Length

4 K i B 8 K i B 1 6 K i B 3 2 K i B 6 4 K i B 1 2 8 K i B 2 5 6 K i B 5 1 2 K i B 1 M i B 2 M i B 4 M i B 8 M i B 1 6 M i B 3 2 M i B 6 4 M i B 1 2 8 M i B 2 5 6 M i B

SSD HDD Higher is better

I/O Size vs Effective Bandwidth

slide-67
SLIDE 67

Intrafile Fragmentation: Avoid breaking large files into small fragments Interfile Fragmentation: Cluster logically related small files

Design goals to address fragmentation

Prediction: 4MiB chunks will substantially reduce aging

slide-68
SLIDE 68

Testing this with Btrfs

slide-69
SLIDE 69

64 37 86 58 72 63 67 65 90 91 68 69 93 98 74 92 67 71 70 66

Metadata and small files are stored in a B-tree Large files get written elsewhere

Big File Bigger File Large File

Btrfs: Larger leaves = less aging?

slide-70
SLIDE 70

Time in seconds / GiB

150 300 450 600

Git pulls performed

1 2 3 4 5 6 7 8 9 1

4k 8k 16k 32k 64k

Bigger leaves does mean less aging! Btrfs allows leaf size to be configured between 4KiB and 64KiB. lower is better

Btrfs Leaf Size Performance

slide-71
SLIDE 71

Cost of large leaves

Why don’t B-tree usually have big leaves? Because making small changes to big leaves causes a lot of writing

slide-72
SLIDE 72

Btrfs Leaf Size Writing

Blocks Written in Thousands

75 150 225 300

Git pulls performed

1 2 3 4 5 6 7 8 9 1

4k 8k 16k 32k 64k

Bigger leaves do mean more writing lower is better Btrfs allows leaf size to be configured between 4KiB and 64KiB.

slide-73
SLIDE 73

B-Tree Performance Tradeoff

More Aging 🙂 Less Writing 😁

Large Leaves Small Leaves

Less Aging 😁 More Writing 🙂

slide-74
SLIDE 74

B-Tree Performance Tradeoff

More Aging 🙂 Less Writing 😁

Large Leaves Small Leaves

Less Aging 😁 More Writing 🙂

This tradeoff is inherent to B-trees

slide-75
SLIDE 75

Other File System Types

Update-in-place Log-structured Write-Optimized

Must other types of file systems age?

See the paper

BεtrFS

slide-76
SLIDE 76

BεtrFS

BεtrFS packs small logically related data in a Bε-tree with 4MiB nodes.

slide-77
SLIDE 77

BεtrFS

BεtrFS packs small logically related data in a Bε-tree with 4MiB nodes.

slide-78
SLIDE 78

BεtrFS

BεtrFS packs small logically related data in a Bε-tree with 4MiB nodes. Bε-trees batch updates which allows leaves to be big without increasing the amount of writing

slide-79
SLIDE 79

Time in seconds / GiB

200 400 600 800

Git pulls performed

1 2 3 4 5 6 7 8 9 1

Git on BetrFS on HDD

Lower is better BetrFS

XFS ext4/F2FS/ZFS Btrfs F2FS ZFS ext4 Btrfs XFS

— Aged — Unaged

slide-80
SLIDE 80

Time in seconds / GiB

20 40 60 80

Git pulls performed

1 2 3 4 5 6 7 8 9 1

Lower is better BetrFS

Git on BetrFS on HDD

— Aged — Unaged

Btrfs F2FS ext4 ZFS

slide-81
SLIDE 81

Time in seconds / GiB

20 40 60 80

Git pulls performed

1 2 3 4 5 6 7 8 9 1

Lower is better BetrFS

Git on BetrFS on HDD

— Aged — Unaged

Btrfs F2FS ext4 ZFS

slide-82
SLIDE 82

And SSDs?

slide-83
SLIDE 83

Time in seconds / GiB

10 20 30

Git pulls performed

1 2 3 4 5 6 7 8 9 1

Lower is better

Btrfs

— Aged — Unaged BetrFS

ZFS ext4 XFS F2FS Btrfs ZFS F2FS/XFS/ext4

Git on BetrFS on SSD

slide-84
SLIDE 84

How to prevent aging

Batch updates to avoid too much writing Rewrite to keep related data in large blocks

slide-85
SLIDE 85

Conclusion

Aging is avoidable It’s easy to age file systems quickly and substantially

slide-86
SLIDE 86

Thank you!

Alex Conway alexander.conway@rutgers.edu betrfs.org