File Syst em Design f or and I nt roduct ion NSF File Ser ver - - PDF document

file syst em design f or and
SMART_READER_LITE
LIVE PREVIEW

File Syst em Design f or and I nt roduct ion NSF File Ser ver - - PDF document

File Syst em Design f or and I nt roduct ion NSF File Ser ver Appliance An appliance is a device designed t o Dave Hit z, J ames Lau, and perf orm a specif ic f unct ion Michael Malcolm Net working t rend has been t o use appliances


slide-1
SLIDE 1

1

File Syst em Design f or and NSF File Ser ver Appliance

Dave Hit z, J ames Lau, and Michael Malcolm

Technical Report TR3002 NetApp 2002

http:/ / www.netapp.com/ tech_library/ 3002.html

(At WPI : ht t p: / / www. wpi.edu/ Academics/ CCC/ Help/ Unix/ snapshot s. ht ml)

I nt roduct ion

  • An appliance is a device designed t o

perf orm a specif ic f unct ion

  • Net working t rend has been t o use

appliances inst ead of general purpose comput ers. Examples:

– rout ers f rom Cisco and Avici – net work t er minals – net work print ers

  • New t ype of net work appliance is an NFS

f ile server

I nt roduct ion : NFS

  • NFS File Server Appliance f ile syst ems

have dif f erent requirement s t han t hose f or a general purpose f ile syst em

– NFS access pat t erns dif f erent t han local f ile access pat t erns

  • Net work Appliance Corporat ion uses Writ e

Anywhere File Layout (WAFL)

I nt roduct ion : WAFL

  • WAFL has 4 r equir ement s

– Fast NFS service – Support large f ile syst ems (10s of GB) t hat can grow (can add disks) – P rovide high perf ormance writ es and support RAI D – Rest art quickly, even af t er unclean shut down

  • NFS and RAI D bot h st r ain wr it e per f or mance:

– NFS server must respond t hat dat a is writ t en – RAI D must writ e parit y bit s also

Out line

  • I nt roduct ion

(done)

  • Snapshot s : User Level

(next )

  • WAFL I mplement at ion
  • Snapshot s: Syst em Level
  • Per f or mance
  • Conclusions

I nt roduct ion t o Snapshot s

  • WAFL’sclaim t o f ame
  • WAFL cr eat es and delet es aut omat ically at pr eset

t imes – Up t o 255 at once

  • Copy-on-wr it e t o avoid duplicat ing blocks in t he

act ive f ile syst em

  • Uses:

– Users can recover f iles – Sys admins can creat e backups f rom running syst em – Rest art quickly af t er unclean shut down

slide-2
SLIDE 2

2

User Access t o Snapshot s

  • Suppose accident ally r emoved f ile named “todo”:

spike% ls - lut .snapshot/*/todo

  • rw-r--r-- 1 hitz 52880 Oct 15 00:00 .snapshot/nightly.0/todo
  • rw-r--r-- 1 hitz 52880 Oct 14 19:00 .snapshot/hourly.0/todo
  • rw-r--r-- 1 hitz 52829 Oct 14 15:00 .snapshot/hourly.1/todo
  • rw-r--r-- 1 hitz 55059 Oct 10 00:00 .snapshot/nightly.4/todo
  • rw-r--r-- 1 hitz 55059 Oct 9 00:00 .snapshot/nightly.5/todo
  • Can t hen r ecover most r ecent ver sion:

spike% cp .snapshot/hourly.0/todo todo

  • Not e, snapshot dir ect or ies (.snapshot) are

hidden in t hat t hey don’t show up wit h ls

Snapshot Administ rat ion

  • The WAFL ser ver allows commands f or sys admins

t o cr eat e and delet e snapshot s, but t ypically done aut omat ically

  • WPI , snapshot s of / home:

– 7:00 AM, 10:00, 1:00, 4:00, 7:00, 10:00, 1:00 AM – Night ly snapshot at midnight every day – Weekly snapshot is made on Sunday at midnight every week

  • Thus, always have: 7 hourly, 7 daily snapshot s, 2 weekly

snapshot s

claypool 32 ccc3=>>pwd /home/claypool/.snapshot claypool 33 ccc3=>>ls hourly.0/ hourly.3/ hourly.6/ nightly.2/ nightly.5/ weekly.1/ hourly.1/ hourly.4/ nightly.0/ nightly.3/ nightly.6/ hourly.2/ hourly.5/ nightly.1/ nightly.4/ weekly.0/

Out line

  • I nt roduct ion

(done)

  • Snapshot s : User Level

(done)

  • WAFL I mplement at ion

(next )

  • Snapshot s: Syst em Level
  • Per f or mance
  • Conclusions

WAFL File Descript ors

  • I node based syst em wit h 4 KB blocks
  • I node has 16 point er s
  • For f iles smaller t han 64 KB:

– Each point er point s t o dat a block

  • For f iles larger t han 64 KB:

– Each point er point s t o indirect block

  • For really large f iles:

– Each point er point s t o doubly-indirect block

  • For ver y small f iles, dat a kept in inode inst ead of

point er s

WAFL Met a-Dat a

  • WAFL st ores met a-dat a in f iles

– I node f ile –st or es inodes – Block-map f ile –st ores f ree blocks – I node

  • map f ile –ident if ies f ree inodes

Zoom of WAFL Met a-Dat a (Tr ee of Blocks)

  • Root inode must be in f ixed locat ion
  • Ot her blocks can be wr it t en anywher e
slide-3
SLIDE 3

3

Snapshot s (1 of 2)

  • Copy root inode only
  • Over t ime, snapshot ref erences more and more dat a

blocks t hat are not used

  • Rat e of f ile change det ermines how many snapshot s

you want t o st ore

Snapshot s (2 of 2)

  • When disk block modif ied, must modif y

indirect point ers as well

  • Bat ch, t o improve I / O perf ormance

Consist ency Point s (1 of 2)

  • I n order t o avoid consist ency checks af t er

unclean shut down, WAFL creat es special snapshot called a consist ency point every f ew seconds

– Not accessible via NFS

  • Bat ched operat ions are writ t en each

consist ency point

  • I n bet ween consist ency point s, dat a only

writ t en t o RAM

Consist ency Point s (2 of 2)

  • WAFL use of NVRAM

– NFS request s are logged t o NVRAM

  • NVRAM has bat t eries t o avoid losing during powerof f

– Upon unclean shut down, re-apply NFS request s t o last consist ency point – Upon clean shut down, creat e consist ency point and t urnof f NVRAM

  • Not e, t ypical FS uses NVRAM f or wr it e cache

– Uses more NVRAM space (WAFL logs are smaller)

  • Ex: “rename” needs 32 KB, WAFL needs 150 byt es
  • Ex: writ e 8KB needs 3 blocks (dat a, inode, indirect

point er), WAFL needs 1 block (dat a) plus 120 byt es f or log

– Slower response t ime t han WAFL

Writ e Allocat ion

  • Writ e t imes dominat e NFS perf ormance

– Read caches at client are large – 5x as many wr it e oper at ions as r ead at server

  • WAFL bat ches writ e request s
  • WAFL allows writ e anywhere, enabling

inode next t o dat a

– Typical FS has inode inf or mat ion and f r ee blocks at f ixed locat ion

  • WAFL allows writ es in any order since uses

consist ency point s

– Typical FS wr it es in f ixed or der t o allow fsck t o wor k

Out line

  • I nt roduct ion

(done)

  • Snapshot s : User Level

(done)

  • WAFL I mplement at ion

(done)

  • Snapshot s: Syst em Level

(next )

  • Per f or mance
  • Conclusions
slide-4
SLIDE 4

4

The Block-Map File

  • Typical FS uses bit f or each f r ee block, 1 is

allocat ed and 0 is f ree – I nef f ect ive f or WAFL since may be ot her snapshot s t hat point t o block

  • WAFL uses 32 bit s f or each block

Creat ing Snapshot s

  • Could suspend NFS, cr eat e snaphost ,

r esume NFS

– But can t ake up t o 1 second

  • Challenge: avoid locking out NFS request s
  • WAFL marks all dirt y cache dat a as

I N_SNAPSHOT

– NFS r equest s can r ead syst em dat a, modif y dat a not I N_SNAPSHOT – Dat a not I N_SNAPSHOT not f lushed t o disk

  • Must f lush I N_SNAPSHOT dat a as

quickly as possible

Flushing I N_SNAPSHOT Dat a

  • Flush inode dat a f ir st

– Keeps t wo caches f or inode dat a, so can copy syst em one t o inode dat a f ile, unblocking most NFS request s (requires no I / O since inode f ile f lushed lat er)

  • Updat e block-map f ile

– Copy act ive bit t o snapshot bit

  • Wr it e all I N_SNAPSHOT dat a

– Rest art any blocked request s

  • Duplicat e r oot inode and t ur n of f I N_SNAPSHOT

bit

  • All done in less t han 1 second, f ir st in 100s of ms

Out line

  • I nt roduct ion

(done)

  • Snapshot s : User Level

(done)

  • WAFL I mplement at ion

(done)

  • Snapshot s: Syst em Level

(done)

  • Per f or mance

(next )

  • Conclusions

Perf ormance (1 of 2)

  • Compar e against NFS syst ems
  • Best is SPEC NFS

– LADDI S: Legat o, Auspex , Digit al, Dat a General, I nt er phase and Sun

  • Measure response t imes versus t hroughput
  • (Me: Syst em Specif icat ions?!)

Per f or mance (2 of 2)

(Typically, car e f or knee in cur ve)

slide-5
SLIDE 5

5

NFS vs. New File Syst ems

2 4 6 8 10 12 14 1000 2000 3000 4000 5000

Generated Load (Ops/Sec) Response Time (Msec/Op)

10 MPFS Clients 5 MPFS Clients & 5 NFS Clients 10 NFS Clients

  • R e m o v e N F S s e r v e r a s b o t t l e n e c k
  • Cl i e n t s w r i t e d i r e c t l y t o d e v i c e

Co nc l us i o n

  • N e t A p p w o r k s an d i s s t ab l e
  • Cons i s t e nc y p oi nt s s i mp l e , r e d uc i ng b ugs i n

c o d e

  • Eas i e r t o d e v e l o p s t ab l e c o d e f o r ne t wo r k

ap p l i anc e t h an f or ge ne r al s y s t e m – Few N FS cl i ent i mpl ement at i ons and l i mi t ed set of oper at i ons so can t est t h or ough l y