Deciding when to forget in the Elephant file system Doug Santry, - - PowerPoint PPT Presentation

deciding when to forget in the elephant file system
SMART_READER_LITE
LIVE PREVIEW

Deciding when to forget in the Elephant file system Doug Santry, - - PowerPoint PPT Presentation

Deciding when to forget in the Elephant file system Doug Santry, Mike Feeley, Norm Hutchinson, Alistair Veitch * , Ross Carton, and Jacob Ofir University of British Columbia Hewlett-Packard Laboratories * Protecting file system data z System and


slide-1
SLIDE 1

Deciding when to forget in the Elephant file system

Doug Santry, Mike Feeley, Norm Hutchinson, Alistair Veitch*, Ross Carton, and Jacob Ofir

University of British Columbia Hewlett-Packard Laboratories*

slide-2
SLIDE 2

SOSP 99 University of British Columbia 2

Protecting file system data

z System and media failure

y Focus of file-system research for many years

z User and application failure

y No protection y Delete and write cause data loss y Artifact of limited storage capacity

slide-3
SLIDE 3

SOSP 99 University of British Columbia 3

Storage is no longer limiting

z Disk capacity trends

y 25 Ð 35 GB now y Increasing by 60% per year y 250 Ð 350 GB in 5 years

z Disks are now:

y Big enough to keep some old versions y Not big enough to keep everything

slide-4
SLIDE 4

SOSP 99 University of British Columbia 4

Protecting data with big disks

z Key idea

y Retain important old versions of files y System, not user, controls storage reclamation

z Key issues

y Is versioning at granularity of file or file system? y How long are old versions retained? y How can users control retention safely?

slide-5
SLIDE 5

SOSP 99 University of British Columbia 5

Previous work

z File-system grain

y Copy-on-write checkpoint of entire file system y Performed periodically y E.g., Plan-9, WAFL, AFS

z File grain

y Copy-on-write of individual files y Performed continuously y E.g., Cedar, VMS

x Retained last few versions x No protection from delete

slide-6
SLIDE 6

SOSP 99 University of British Columbia 6

Elephant overview

z Delete and write

y Do not cause data loss immediately

z Storage reclamation

y File-grain retention policies specified by users y Policies implemented by system cleaner

z User interface

y Rollback to any point in the past

x {open,cd,É} filename@yesterday:12:00

slide-7
SLIDE 7

SOSP 99 University of British Columbia 7

Talk outline

z Principles and retention policies z Prototype implementation

y Meta data y File and name histories

z Evaluation

y Workload analysis y User experience

slide-8
SLIDE 8

SOSP 99 University of British Columbia 8

Protection depends on file type

z Read only z System managed

y Derived y Cached y Temporary

z User managed

slide-9
SLIDE 9

SOSP 99 University of British Columbia 9

Principles

z Near-term reversibility

y Of every operation on valuable data y For a limited period of time

z Long-term history

y Of selected files y Including only selected landmark versions

slide-10
SLIDE 10

SOSP 99 University of British Columbia 10

File-grain retention policies

z Keep One

y Update date in place and immediate delete

z Keep All

y Retain all versions

z Keep Safe

y Retain all versions for second-chance interval

z Keep Landmarks

y Retain only landmark versions

slide-11
SLIDE 11

SOSP 99 University of British Columbia 11

Potential-landmark heuristic

z Key observations

y Files are modified in barrages y Ability to differentiate edits degrades with time

z Strategy

y Designate lead edit of barrage as landmark y Barrage ÒgranularityÓ increases with time

time edits potential landmarks

slide-12
SLIDE 12

SOSP 99 University of British Columbia 12

History discontinuities

z Deleted versions

y Discontinuity in fileÕs history y System can report all discontinuities to user

z Grouping files

y User groups related files y A landmark of any file is landmark for group

slide-13
SLIDE 13

SOSP 99 University of British Columbia 13

User implemented policies

z New policies

y Written as user-level programs y Registered with kernel y Used in the same way as standard polices

z Cleaning

y System cleaner execs user-policy program y Runs with privileges of fileÕs owner

slide-14
SLIDE 14

SOSP 99 University of British Columbia 14

Elephant prototype

z Implementation

y New VFS in FreeBSD 2.2.8

z Interface

y Add time to any pathname Òfile@timeÓ y Set processÕs default time y Set fileÕs policy or group files y Make version a landmark y Read a fileÕs history y Tools including: tls, tgrep, tdiff, and tview

slide-15
SLIDE 15

SOSP 99 University of British Columbia 15

Versioning meta data

z Inode history

y Inode log contains fileÕs copy-on-write inodes y Inode added to log on first write after open y Non-versioned files stored by standard inode

z Name history

y Directory lists name creation and deletion time y Name retained until all file versions are deleted y Old names periodically moved to history inode

slide-16
SLIDE 16

SOSP 99 University of British Columbia 16

Two views of history

z File (inode) history

y All versions of a file independent of its name y Rename not reflected in file history

z Name history

y Name can refer to different files at different times y Some applications rely on name history

x Modify file by first renaming to backup (e.g., emacs)

z Elephant provides both views of history

slide-17
SLIDE 17

SOSP 99 University of British Columbia 17

Workload analysis

z Measured system

y Workgroup server at HP Labs y Supporting 12 active researchers y Used for development, document prep., etc. y 15 GB, 360,000 files, 27,000 directories

z Analysis

y File-type distribution y Write-traffic distribution

slide-18
SLIDE 18

SOSP 99 University of British Columbia 18

File-type taxonomy

z Source

y C, C++, perl, shell scripts

z Documents

y text, HTML, word processor, mail

z Derived

y object, library, exec, postscript, PDF

z Archive

y tar, compressed, data

z Temporary

y *.tmp, web-browser caches

slide-19
SLIDE 19

SOSP 99 University of British Columbia 19

Allocating policies by file type

z Keep One

y Derived y Temporary

z Keep Safe

y Archive

z Keep Landmarks

y Source y Documents y Other

slide-20
SLIDE 20

SOSP 99 University of British Columbia 20

Storage by policy

33.6 56.3 3.9 28.5 62.4 15.2

Files (%) Bytes (%)

Keep Landmarks Keep Safe Keep One

slide-21
SLIDE 21

SOSP 99 University of British Columbia 21

Write traffic

z Trace

y Same HP-Labs workgroup server y Collected Aug 29 Ð Oct 8, used Sep 27 Ð Oct 1 y Records all open, close, read, and write y Includes file name

z Summary

y 112 MB / day written on average y 15 GB of total storage, 12 active users

slide-22
SLIDE 22

SOSP 99 University of British Columbia 22

Storage growth by policy

33.6 56.3 98.7 3.9 62.4 15.2 28.5 0.6 0.7

Files (%) Bytes (%) Writes (%Bytes)

Keep Landmarks Keep Safe Keep One

slide-23
SLIDE 23

SOSP 99 University of British Columbia 23

Importance of file-grain retention

3.4 0.042

30-day history (GB)

File-system checkpoint Elephant

slide-24
SLIDE 24

SOSP 99 University of British Columbia 24

NFS shadowing

z Problem

y Would you trust your data to a research FS?

z Solution

y Elephant prototype can shadow an NFS server

x Snoops network for NFS packets x Updates shadow Elephant file system

y Users

x Create and update files via NFS x Read current and historic versions via Elephant

slide-25
SLIDE 25

SOSP 99 University of British Columbia 25

Conclusions

z Protecting data from users and applications

y Files require different degrees of protection

x Reversibility: all versions for limited period x History: landmark versions forever

y Important versions are small fraction of disk

z Elephant

y File-grain retention policies specified by users y Retains all important older versions y Rollback file, directory, or fs to any point in past