CS4513 Process capacit y rest rict ed t o vmem size When process t - - PDF document

cs4513
SMART_READER_LITE
LIVE PREVIEW

CS4513 Process capacit y rest rict ed t o vmem size When process t - - PDF document

Mot ivat ion Processes st ore, ret rieve inf ormat ion CS4513 Process capacit y rest rict ed t o vmem size When process t erminat es, memory lost Dist ribut ed Comput er Mult iple processes share inf ormat ion Syst ems


slide-1
SLIDE 1

1

CS4513 Dist ribut ed Comput er Syst ems

File Syst ems

Mot ivat ion

  • Processes st ore, ret rieve inf ormat ion
  • Process capacit y rest rict ed t o vmem size
  • When process t erminat es, memory lost
  • Mult iple processes share inf ormat ion
  • Requirement s:

– large – persistent – concurrent access

Solut ion? File Syst em!

File Syst ems

  • Abst ract ion t o disk (convenience)

– “The only t hing f r iendly about a disk is t hat it has persist ent st orage.” – Devices may be dif f er ent : t ape, I DE/ SCSI , NFS

  • Users

– don’t car e about det ail – car e about int er f ace

  • OS

– cares about implement at ion (ef f iciency)

File Syst em Concept s

  • Files - st ore t he dat a
  • Direct ories - or ganize f iles
  • Par t it ions - separat e collect ions of

direct ories (also called “volumes”)

– all direct ory inf ormat ion kept in part it ion – mount f ile syst em t o access

  • Pr ot ect ion - allow/ rest rict access f or f iles,

direct ories, part it ions

Out line

  • Files

← ←

– User’s point of view – Example “Under t he Hood” – File syst em implement at ion

  • Direct ories
  • Disk space management
  • Misc

Files: The User’s Point of View

  • Naming: how do I ref er t o it ?

– blah, BLAH, Blah – f ile.c, f ile.com

  • St ruct ure: what ’s inside?

– Sequence of byt es (most modern OSes) – Records - some int er nal st r uct ur e – Tree - or ganized r ecor ds

slide-2
SLIDE 2

2

Files: The User’s Point of View

  • Type:

– ascii - human readable – binary - comput er only readable – “magic number” or ext ension (execut able, c-f ile … )

  • Access Met hod:

– sequent ial (f or charact er f iles, an abst ract ion of I / O of serial device such as a modem) – random (f or block f iles, an abst ract ion of I / O t o block device such as a disk)

  • At t ribut es:

– t ime, prot ect ion, owner, hidden, lock, size ...

File Operat ions

  • Creat e
  • Delet e
  • Tr uncat e
  • Open
  • Read
  • Writ e
  • Append
  • Seek - f or r andom access
  • Get at t r ibut es
  • Set at t r ibut es

Example: Unix open()

int open(char *path, int flags [, int mode])

  • path is name of f ile
  • flags is bit map t o set swit ch

– O_RDONLY, O_WRONLY… – O_CREATE t hen use mode f or per ms

  • on success, ret urns index
  • on f ailure, ret urns -1

Unix open() - Under t he Hood

int fid = open(“blah”, flags); read(fid, …); User Space Syst em Space stdin stdout stderr ...

1 2 3

File Structure ... ... File Descriptor (where blocks are) (at t ribut es) (index) (P er process) (P er device)

Example: Windows CreateFile()

  • Ret urns f ile obj ect handle:

HANDLE CreateFile ( lpFileName, // name of file dwDesiredAccess, // read-write dwShareMode, // shared or not lpSecurity, // permissions ... )

  • File obj ect s used f or all: f iles,

direct ories, disk drives, port s, pipes, socket s and console

File Syst em I mplement at ion

P rocess Cont rol Block Open File P

  • int er

Ar r ay Open File Table File Descript or Table (in memory copy,

  • ne per

device) (per process) Disk File sys info File descriptors

Copy f d to mem

Directories

Data Next up: f ile descr ipt or s!

slide-3
SLIDE 3

3

File Syst em I mplement at ion

  • Which blocks wit h which f ile?
  • File descript or implement at ions:

– Cont iguous – Linked List – Linked List wit h I ndex – I -nodes

File Descriptor

Cont iguous Allocat ion

  • St ore f ile as cont iguous block

– ex: w/ 1K block, 50K f ile has 50 consecut ive blocks

File A: start 0, length 2 File B: start 14, length 3

  • Good:

– Easy: remember locat ion wit h 1 number – Fast : r ead ent ir e f ile in 1 oper at ion (lengt h)

  • Bad:

– St at ic: need t o know f ile size at creat ion

  • or t ough t o grow!

– Fr agment at ion: r emember why we had paging?

Linked List Allocat ion

  • Keep a linked list wit h disk blocks
  • Good:

– Easy: r emember 1 number (locat ion) – Ef f icient : no space lost in f r agment at ion

  • Bad:

– Slow: r andom access bad File Block File Block 1 File Block 2

P hysical Block null

4 7 2 File Block File Block 1

null

6 3

Linked List Allocat ion wit h I ndex

  • Table in memory

– f ast er random access – can be large!

  • 1k blocks, 500MB disk
  • = 2MB!

– Windows:

  • MS-DOS FAT
  • Win98 VFAT

Physical Block 1 null 2 null 3 7 4 5 3 6 2 7

I -nodes

  • Fast f or small

f iles

  • Can hold big f iles
  • Size?

– 4 kbyt e block

Disk blocks i-node at t r ibut es

single indir ect block double indir ect block (t r iple indir ect Block)

Out line

  • Files

(done)

  • Direct ories

← ←

  • Disk space management
  • Misc
slide-4
SLIDE 4

4

Dir ect or ies

  • J ust like f iles, only have special bit set so

you cannot modif y t hem (what ?)

– dat a in dir ect or y is inf or mat ion / links t o f iles – modif y t hrough syst em call – (See ls.c sample)

  • Organized f or:

– ef f iciency - locat ing f ile quickly – convenience - user pat t er ns

  • gr oups (.c, .exe), same names
  • Tree st ruct ure direct ory t he most f lexible

– aliases allow f iles t o appear at mor e t han one locat ion (mor e on t his lat er )

Dir ect or ies

  • Bef ore reading f ile, must be opened
  • Direct ory ent ry provides inf ormat ion t o

get blocks

– disk locat ion (block, address) – i-node number

  • Map ascii name t o t he f ile descript or

Simple Dir ect or y

  • No hierarchy (all “root ”)
  • Ent ry

– name – block count – block numbers

name block count block number s

(What are t he drawbacks?)

Hierarchical Direct ory (MS-DOS)

  • Tree
  • Ent r y:

– name

  • dat e

– t ype (ext ension)

  • block number (w/ FAT)

– t ime

name type attrib time date block size

Hierarchical Direct ory (Unix)

  • Tree
  • Ent r y:

– name – inode number (t r y “ls –I” or “ls –iad .”)

  • example:

/usr/bob/mbox

inode name

Unix Direct ory Example

1 . 1 .. 4 bin 7 dev 14 lib 9 etc 6 usr 8 tmp

132 Root Direct ory Looking up usr gives I -node 6

6 . 1 .. 26 bob 17 jeff 14 sue 51 sam 29 mark

Block 132 Looking up bob gives I -node 26

26 . 6 .. 12 grants 81 books 60 mbox 17 Linux

Aha! I -node 60 has cont ent s

  • f mbox

I -node 6 406 I -node 26 Relevant dat a (/usr) is in block 132 Block 406 /usr/bob is in block 406

slide-5
SLIDE 5

5

St oring Files

  • What if …

a) Direct ory ent ry cont ains disk blocks? b) Direct ory ent ry point s t o f ile descript or? c) Have new t ype of f ile “link”?

B C

B ? B B

(Not r eally a t ree

  • Direct ed Acyclic

Graph)

“alias”

I ssues

  • a) Dir ect or y ent r y cont ains disk blocks?

– cont ent s (blocks) may change

  • b) Dir ect or y ent r y point s t o f ile descr ipt or ?

– if removed, ref ers t o non-exist ent f ile – must keep count , remove only if 0 – hard link – Similar if delet e f ile in use (show example)

  • c) Have new t ype of f ile “link”?

– cont ains alt ernat e name f or f ile – overhead, must parse t ree second t ime – sof t link – of t en have max link count in case loop (show example)

Out line

  • Files

(done)

  • Direct ories

(done)

  • Disk space management

← ←

  • Misc

Disk Space Management

  • n byt es

– cont iguous – blocks

  • Similar it ies wit h memor y management

– cont iguous is like variable-sized part it ions

  • but moving on disk very slow!
  • so use blocks

– blocks are like paging

  • how t o choose block size?
  • (Not e, disk block size t ypically 512 byt es, but f ile

syst em logical block size chosen when f or mat t ing)

Choosing Block Size

  • Large blocks

– f ast er t hr oughput , less seek t ime – wast ed space (int ernal f ragment at ion)

  • Small blocks

– less wast ed space – more seek t ime since more blocks

Data Rate Disk Space Utilization Block size

Keeping Tr ack of Fr ee Blocks

  • Two met hods

– linked list of disk blocks

  • one per block or many per block

– bit map of disk blocks

  • Linked List of Free Blocks (many per block)

– 1K block, 16 bit disk block number

= 511 f r ee blocks/ block

  • 200 MB disk needs 400 f ree blocks = 400k
  • Bit Map
  • 200 MB disk needs 20 Mbit s
  • Wit h linked list , 30 blocks = 30k
  • 1 bit vs. 16 bit s

(not e, t hese ar e st or ed on t he disk)

slide-6
SLIDE 6

6

Tradeof f s

  • Only if t he disk is nearly f ull does linked

list scheme require f ewer blocks

  • I f enough RAM, bit map met hod pref erred
  • I f only 1 “block” of RAM, and disk is f ull,

bit map met hod may be inef f icient since have t o load mult iple blocks

– linked list can t ake f irst in line

  • Somet imes, combine bot h (Linux)

File Syst em Perf ormance

  • Disk access 100,000x slower t han memory

– reduce number of disk accesses needed!

  • Block/ buf f er cache

– cache t o memory

  • Full cache? FI FO, LRU, 2nd chance …

– Unlike in VM, exact LRU can be done (why?)

  • LRU inappropriat e somet imes

– cr ash w/ i-node can lead t o inconsist ent st at e – some r ar ely r ef er enced (double indir ect block)

Modif ied LRU

  • I s t he block likely t o be needed soon?

– if no, put at beginning of list

  • I s t he block essent ial f or consist ency of

f ile syst em?

– wr it e immediat ely

  • Occasionally writ e out all

– sync

Out line

  • Files

(done)

  • Direct ories

(done)

  • Disk space management

(done)

  • Misc

← ←

– par t it ions (fdisk, mount) – maint enance – quot as

  • Linux and WinNT/ 2000

P art it ions

  • mount, unmount

– load “super -block” f r om disk – pick “access point ” in f ile- syst em

  • Super -block

– f ile syst em t ype – block size – f ree blocks – f ree I -nodes

/(root) usr home tmp

Part it ions: fdisk

  • Par t it ion is lar ge gr oup of sect or s allocat ed f or a

specif ic pur pose – I DE disks limit ed t o 4 physical part it ions – logical (ext ended) part it ion inside physical part it ion

  • Specif y number of cylinder s t o use
  • Specif y t ype

– magic number recognized by OS

(Hey, show example)

slide-7
SLIDE 7

7

File Syst em Maint enance

  • Format :

– creat e f ile syst em st ruct ure: super block, I-nodes – format (Win), mke2fs (Linux)

  • “Bad blocks”

– most disks have some – scandisk (Win) or badblocks (Linux) – add t o “bad

  • blocks” list (f ile syst em can ignore)
  • Def r agment

– arrange blocks ef f icient ly

  • Scanning (when syst em cr ashes)

– lost +f ound, correct ing f ile descript ors... – e2fsck (Linux) or fsck (Win)

Disk Quot as

  • Table 1: Open f ile t able in memory

– when f ile size changed, charged t o user – user index t o t able 2

  • Table 2: quot a r ecor d

– sof t limit checked, exceed allowed w/ warning – hard limit never exceeded

  • Overhead? Again, in memory, so relat ively

f ast

  • Limit : blocks, f iles, i-nodes

Linux: Virt ual File Syst em

  • File syst em

independent layer

  • Generic inode and

direct ory ent ry f or all

– Even if not inode based

  • Specif ic f ile

syst ems regist er wit h VFS

Linux File Syst em: ext2fs

  • “Ext ended

(f r om Minix) f ile syst em,v2”

  • Uses inodes

– mode f or f ile, dir ect or y, symbolic link ...

  • (See:
  • struct inode)

Linux File Syst em: ext2 dir ect or ies

  • Special f ile wit h names (+ lengt h) and

inodes See:

  • struct ext2_dir_entry_2
  • Cached. See:
  • struct dentry

Linux File Syst em: ext2 blocks

  • Def ault is 1 Kb blocks

– small!

  • For higher per f or mance

– perf orms I / O in chunks (reduce request s) – clust ers adj acent request s (block groups)

  • Keep dat a blocks close t o inodes
  • Keep f ile inodes close t o direct ory inodes
  • Group has:

– bit-map of f ree blocks and I -nodes – copy of super block

slide-8
SLIDE 8

8

Linux File Syst em: ext2 Superblock

  • Magic Number

– allow mount ing check t hat is an EXT2 f ile syst em

  • Revision Level

– maj or and minor revision levels f or compat ibilit y check

  • Mount Count and Maximum Mount Count

– Run esfsck if reach max

  • Block Size

– Block size in byt es, f or example 1024 byt es

  • Blocks per Group

– Blocks in a group. Fixed when t he f ile syst em is creat ed

  • Free Blocks

– Free blocks in t he f ile syst em

  • Free I nodes

– Free I nodes in t he f ile syst em

  • First I node

– Fir st inodein t he f ile syst em. P

  • int s t o root dir

(See struct superblock)

Linux Filesyst em: /proc

  • Cont ent s of “f iles” not st ored, but

comput ed

  • Provide int erf ace t o kernel st at ist ics
  • Allows access t o “t ext ” using Unix t ools
  • Again, enabled by “virt ual f ile syst em”
  • (NT/ 2000 has perf mon t o access regist ry)
  • (show example in /proc)
  • (show biteMe module example)

WinNT/ 2000 Filesyst em: NTFS

  • Volume (par t it ion) can cover par t , all or mult iple

disks

  • Basic allocat ion unit called a clust er (block)
  • Each f ile has st r uct ur e, made up of at t r ibut es

– Examples: t ime modif ied, permissions, aut hor… – at t ribut es are a st ream of byt es – st ored in Mast er File Table, 1 ent ry per f ile

  • Met adat a (f ree blocks, et c) kept in MFT f or volume

– each has unique I D

  • part f or MFT index, part f or “version” of f ile f or

caching and consist ency

  • Hier ar chical dir ect or y wit h int er nal st r uct ur e

st or ed as B+ t r ee (f or ef f iciency)

  • Suppor t s compr ession plus encr ypt ion

WinNT/ 2000 Filesyst em: Recovery

  • Avoid t he need t o fdisk
  • Use dat abase not ion of “t r ansact ion” (all or none)

– Bef ore dat a commit t ed, record st art in log – Also cont ain redo or undo inf ormat ion – Af t er dat a writ t en, writ e t o log t hat done

  • I f a cr ash, r edo or undo ops t hat did not f inish
  • Per iodically (5 sec by def ault ) r ecor d checkpoint

– C an t he discard log

  • Not e, does not guar ant ee dat a is ok, only met adat a
  • Linux has:

– resiserfs (j ournaling of met adat a) + ext3 (j ournaling met adat a + dat a)

  • (See “samples” f or j our naling + f ile syst em st uf f )