File System Aging
Featuring slides modified from a talk by
Martín Farach-Colton Rutgers University
File System Aging Featuring slides modified from a talk by Martn - - PowerPoint PPT Presentation
File System Aging Featuring slides modified from a talk by Martn Farach-Colton Rutgers University This Class Aging Two papers Smith and Seltzer Conway et al. How do people feel about the readings? This Class Aging Two
Featuring slides modified from a talk by
Martín Farach-Colton Rutgers University
Aging
Aging
Outline
DAM model: How theorists think about external memory algorithms
Goal: Minimize # of I/Os
block size B, memory size M, data size N.
Disk RAM B B M
[Aggarwal+Vitter ’88]
Short answer: Yes (2-competitive) Long answer: No (can’t tune parameters)
Affine model:
incremental bandwidth cost of subsequent blocks
DAM costs.
Goal: Minimize cost of I/Os
size M, data size N.
1 + αk Takeaway: the affine model captures the size of I/Os as well as the speed of the device itself.
The goal of our model is to predict performance. We can verify “things” using a benchmark
same well-specified workload on each system
A and B, and either:
system and are better able to present an input to our model
To be useful, we need to run representative benchmarks under representative conditions
What is the representative state of a file system?
Is the state of a file system a path or a point?
isolation, we need to know where we started and how we got there.
Theory: many file systems will age.
Two open questions:
system?
Chris Hoffman at howtogeek.com says: “Linux’s ext2, ext3, and ext4 file systems… [are] designed to avoid fragmentation in normal use.” “If you do have problems with fragmentation on Linux, you probably need a larger hard disk.”
Chris Hoffman at howtogeek.com says: “Linux’s ext2, ext3, and ext4 file systems… [are] designed to avoid fragmentation in normal use.” “If you do have problems with fragmentation on Linux, you probably need a larger hard disk.” “Modern Linux filesystems keep fragmentation at a minimum…Therefore it is not necessary to worry about fragmentation in a Linux system.”
So: as of 1997, file systems aged. Then file systems got better, and sys admins say they don’t age. What’s the actual story?
1 1
Year: X+~4 years
Year: X+~4 years Density: doubles in each dimension every 4 years or so
Year: X+~4 years Density: doubles in each dimension every 4 years or so
1 1Hard disks gradually increase ⍺ Measurements one decade have a sell-by date … unless you solve the problem algorithmically
Assumption
Conclusion
So, for people who think that file systems don’t age, are you sure that modern file systems keep fragmentation to under 1%?
File Systems Types Logging: F2FS B-tree: BtrFS Bε-tree: BεtrFS Heuristic based update-in- place: FFS, ext4, … 😴 🤕 🤔 🤕
Should age Should age Should age Shouldn’t age
Keith Smith started grad school in ’92
He and Seltzer found that:
We’d like a history of file systems changes
Let’s model a very simple case: Developers
We’d like a history of file systems changes
Let’s model a very simple case: Developers
get coffee
We’d like a history of file systems changes
Let’s model a very simple case: Developers
git pull
get coffee git pull
We’d like a history of file systems changes
Let’s model a very simple case: Developers
git pull make
get coffee git pull make
We’d like a history of file systems changes
Let’s model a very simple case: Developers
git pull make
get coffee git pull make get coffee
We’d like a history of file systems changes
Let’s model a very simple case: Developers
git pull make
get coffee git pull make get coffee git pull
We’d like a history of file systems changes
Let’s model a very simple case: Developers
git pull make git pull make
get coffee git pull make get coffee git pull add awesome features
We’d like a history of file systems changes
Let’s model a very simple case: Developers
git pull make git pull make
get coffee git pull make get coffee git pull add awesome features get coffee
We’d like a history of file systems changes
Let’s model a very simple case: Developers
git pull make git pull make
get coffee git pull make get coffee git pull add awesome features get coffee git pull
We’d like a history of file systems changes
Let’s model a very simple case: Developers
get coffee git pull make get coffee git pull add awesome features get coffee git pull fix bugs . . .
git pull make git pull make
We’d like a history of file systems changes
Let’s model a very simple case: Developers
get coffee git pull make get coffee git pull add awesome features get coffee git pull fix bugs . . .
git pull make git pull make
We’d like a history of file systems changes
Let’s model a very simple case: Developers
get coffee git pull make get coffee git pull add awesome features get coffee git pull fix bugs . . .
git pull make git pull make
get coffee git pull make get coffee git pull add awesome features get coffee git pull fix bugs . . .
git pull make git pull make
Do 100 git pulls Measure Performance
Use the Linux kernel repo from github.com
Like timing a preorder traversal of tree… Should measure fragmentation
time grep -r random_string /path/to/fs
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
Metadata Fragmentation
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
Metadata Fragmentation
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
Metadata Fragmentation
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
Metadata Fragmentation Intrafile Fragmentation
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
Metadata Fragmentation Intrafile Fragmentation
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
Metadata Fragmentation Intrafile Fragmentation Interfile Fragmentation
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
Metadata Fragmentation Intrafile Fragmentation Interfile Fragmentation
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
Metadata Fragmentation Intrafile Fragmentation Interfile Fragmentation
time grep -r random_string /path/to/filesystem
dir file1 file2 file3 file4
Metadata Fragmentation Intrafile Fragmentation Interfile Fragmentation
Then normalize per gigabyte read
Time in seconds / GiB
200 400 600 800
Git pulls performed
1 2 3 4 5 6 7 8 9 1
Lower is better
Our Setup: Cold Cache, 3.4 GHz Quad Core, 4GiB RAM, 20 GiB HDD partition - SATA 7200 RPM
14.3x
Time in seconds / GiB
200 400 600 800
Git pulls performed
1 2 3 4 5 6 7 8 9 1
14.3x Lower is better
Our Setup: Cold Cache, 3.4 GHz Quad Core, 4GiB RAM, 20 GiB HDD partition - SATA 7200 RPM
2x slowdown 4x slowdown
Time in seconds / GiB
200 400 600 800
Git pulls performed
1 2 3 4 5 6 7 8 9 1
Lower is better
Our Setup: Cold Cache, 3.4 GHz Quad Core, 4GiB RAM, 20 GiB HDD partition - SATA 7200 RPM
15 minutes to grep 1.2GiB
14.3x
Smaller files, shallower tree, …
Idea: copy same logical state to new partition
Time in seconds / GiB
200 400 600 800
Git pulls performed
1 2 3 4 5 6 7 8 9 1
Lower is better Aged Unaged 8.8x
Maybe it’s full disks? Nope: 20GiB partition, 1.2 GiB data
Time in seconds / GiB
200 400 600 800
Git pulls performed
1 2 3 4 5 6 7 8 9 1
Lower is better Aged Unaged 8.8x Smaller average file size makes the unaged 60% slower
Btrfs
200 400 600 800
F2FS
500 1000 1500 2000
ZFS
500 1000 1500 2000
XFS
200 400 600 800
20.6x 22.4x 2.2x weird unaged behavior on XFS 11.8x
Lower is better
1.9x
Time in seconds / GiB
10 20 30
Git pulls performed
1 2 3 4 5 6 7 8 9 1
Aged Unaged Lower is better
Other file systems give similar results (~2x slowdown)
Time in seconds / GiB 200 400 600 800 Git pulls performed 1 2 3 4 5 6 7 8 9 1 Lower is better BetrFS XFS ext4/F2FS/ZFS Btrfs F2FS ZFS ext4 Btrfs XFS — Aged — Unaged
BetrFS — Aged — Unaged Time in seconds / GiB 20 40 60 80 Git pulls performed 1 2 3 4 5 6 7 8 9 1 Btrfs F2FS ext4 ZFS Lower is better
Time in seconds / GiB 10 20 30 Git pulls performed 1 2 3 4 5 6 7 8 9 1 Btrfs
Lower is better — Aged — Unaged BetrFS
🃠Rutgers University, ♢The University of North Carolina at Chapel Hill, ♠Stony Brook University, ♡Oracle Corporation and Massachusetts
Institute of Technology, ♣Farmingdale State College of SUNY
Alex Conway🃠 Ainesh Bakshi🃠 Yizheng Jiao♢ Yang Zhan♢ Michael A. Bender♠ William Jannen♠ Rob Johnson♠ Bradley C. Kuszmaul♡ Donald E. Porter♢ Jun Yuan♣ Martin Farach- Colton🃠