1
CSE 513 I ntroduction to Operating Systems Class 9 - Distributed - - PowerPoint PPT Presentation
CSE 513 I ntroduction to Operating Systems Class 9 - Distributed - - PowerPoint PPT Presentation
CSE 513 I ntroduction to Operating Systems Class 9 - Distributed and Multiprocessor Operating Systems J onat han Walpole Dept . of Comp. Sci. and Eng. Oregon Healt h and Science Universit y 1 Why use parallel or distributed systems?
2
Why use parallel or distributed systems?
Speed - reduce time to answer Scale - increase size of problem Reliability - increase resilience to errors Communication - span geographical distance
3
Overview
Multiprocessor systems Multi- computer systems Distributed systems
Multiprocessor, multi- computer and distributed architectures
shar ed memor y mult ipr ocessor message passing mult i-comput er (clust er ) wide ar ea dist r ibut ed syst em
Multiprocessor Systems
6
Multiprocessor systems
Def inition:
A comput er syst em in which t wo or mor e CPUs
shar e f ull access t o a common RAM
Hardware implements shared memory among
CPUs
Architecture determines whether access times
to dif f erent memory regions are the same
UMA - unif or m memor y access NUMA - non-unif or m memor y access
7
Bus- based UMA and NUMA architectures
Bus becomes t he bot t leneck as number of CPUs increases
8
Crossbar switch- based UMA architecture
I nt erconnect cost increases as square of number of CPUs
9
Multiprocessors with 2x2 switches
10
Omega switching network f rom 2x2 switches
I nt erconnect suf f ers cont ent ion, but cost s less
11
NUMA multiprocessors
- Single address space visible to all CPUs
- Access to remote memory via commands
- LOAD
- STORE
- Access to remote memory slower than to local
memory
- Compilers and OS need to be caref ul about
data placement
12
Directory- based NUMA multiprocessors
(a) 256- node directory based multiprocessor (b) Fields of 32- bit memory address (c) Directory at node 36
13
Operating systems f or multiprocessors
OS structuring approaches
Pr ivat e OS per CPU Mast er -slave ar chit ect ur e Symmet r ic mult ipr ocessing ar chit ect ur e
New problems
mult ipr ocessor synchr onizat ion mult ipr ocessor scheduling
14
The private OS approach
I mplications of private OS approach
shared I / O devices st at ic memory allocat ion no dat a sharing no parallel applicat ions
15
The master- slave approach
- OS only runs on master CPU
Single kernel lock prot ect s OS dat a st ruct ures Slaves t rap syst em calls and place process on scheduling
queue f or mast er
- Parallel applications supported
Memory shared among all CP
Us
- Single CPU f or all OS calls becomes a bottleneck
16
Symmetric multiprocessing (SMP)
- OS runs on all CPUs
Mult iple CP
Us can be execut ing t he OS simult aneously
Access t o OS dat a st ruct ures requires synchronizat ion Fine grain crit ical sect ions lead t o more locks and more
parallelism … and more pot ent ial f or deadlock
17
Multiprocessor synchronization
Why is it dif f erent compared to single
processor synchronization?
Disabling int er r upt s does not pr event memor y
accesses since it only af f ect s “t his” CPU
Mult iple copies of t he same dat a exist in caches of
dif f er ent CPUs
- atomic lock instructions do CPU- CPU communication
Spinning t o wait f or a lock is not always a bad idea
18
Synchronization problems in SMPs
TSL instruction is non- trivial on SMPs
19
Avoiding cache thrashing during spinning
Multiple locks used to avoid cache thrashing
20
Spinning versus switching
I n some cases CPU “must” wait
scheduling cr it ical sect ion may be held
I n other cases spinning may be more ef f icient
than blocking
spinning wast es CPU cycles swit ching uses up CPU cycles also if cr it ical sect ions ar e shor t spinning may be bet t er
t han blocking
st at ic analysis of cr it ical sect ion dur at ion can
det er mine whet her t o spin or block
dynamic analysis can impr ove per f or mance
21
Multiprocessor scheduling
Two dimensional scheduling decision
t ime (which pr ocess t o r un next ) space (which pr ocessor t o r un it on)
Time sharing approach
single scheduling queue shar ed acr oss all CPUs
Space sharing approach
par t it ion machine int o sub-clust er s
22
Time sharing
Single data structure used f or scheduling Problem - scheduling f requency inf luences
inter- thread communication time
23
I nterplay between scheduling and I PC
- Problem with communication between two threads
bot h belong t o process A bot h running out of phase
24
Space sharing
Groups of cooperating threads can communicate at
the same time
f ast int er-t hread communicat ion t ime
25
Gang scheduling
Problem with pure space sharing
Some par t it ions ar e idle while ot her s ar e over loaded
Can we combine time sharing and space sharing
and avoid introducing scheduling delay into I PC?
Solution: Gang Scheduling
Gr oups of r elat ed t hr eads scheduled as a unit (gang) All member s of gang r un simult aneously on dif f erent
t imeshar ed CPUs
All gang member s st ar t and end t ime slices t oget her
26
Gang scheduling
Multi- computer Systems
28
Multi- computers
Also known as
clust er comput ers clust ers of workst at ions (COWs)
Def inition:Tightly- coupled CPUs that do not
share memory
29
Multi- computer interconnection topologies
(a) single swit ch (b) r ing (c) grid (d) double t orus (e) cube (f ) hypercube
30
Store & f orward packet switching
31
Network interf aces in a multi- computer
Network co- processors may of f - load
communication processing f rom the main CPU
32
OS issues f or multi- computers
Message passing perf ormance Programming model
synchr onous vs asynchor nous message passing dist r ibut ed vir t ual memor y
Load balancing and coordinated scheduling
33
Optimizing message passing perf ormance
Parallel application perf ormance is dominated by
communication costs
int er r upt handling, cont ext swit ching, message
copying …
Solution - get the OS out of the loop
map int er f ace boar d t o all pr ocesses t hat need it act ive messages - give int er r upt handler addr ess of
user -buf f er
sacr if ice pr ot ect ion f or per f or mance?
34
CPU / network card coordination
How to maximize independence between CPU and
network card while sending/ receiving messages?
Use send & r eceive r ings and bit -maps
- ne always set s bit s, one always clear s bit s
35
Blocking vs non- blocking send calls
- Minimum services
provided
send and receive
commands
- These can be blocking
(synchronous) or non- blocking (asynchronous) calls
(a) Blocking send call (b) Non-blocking send call
36
Blocking vs non- blocking calls
Advantages of non- blocking calls
abilit y t o over lap comput at ion and communicat ion
impr oves per f or mance
Advantages of blocking calls
simpler pr ogr amming model
37
Remote procedure call (RPC)
Goal
suppor t execut ion of r emot e pr ocedur es make r emot e pr ocedur e execut ion indist inguishable
f r om local pr ocedur e execut ion
allow dist r ibut ed pr ogr amming wit hout changing t he
pr ogr amming model
38
Remote procedure call (RPC)
Steps in making a remote procedure call
client and ser ver st ubs ar e pr oxies
39
RPC implementation issues
Cannot pass pointers
call by r ef er ence becomes copy-r est or e (at best )
Weakly typed languages
Client st ub cannot det er mine size of r ef er ence
par amet er s
Not always possible t o det er mine par amet er t ypes
Cannot use global variables
may get moved (r eplicat ed) t o r emot e machine
Basic problem - local procedure call relies on
shared memory
40
Distributed shared memory (DSM)
Goal
use sof t war e t o cr eat e t he illusion of shar ed
memor y on t op of message passing har dwar e
lever age vir t ual memor y har dwar e t o page f ault on
non-r esident pages
ser vice page f ault s f r om r emot e memor ies inst ead
- f f r om local disk
41
Distributed shared memory (DSM)
DSM at the hardware, OS or middleware layer
42
Page replication in DSM systems
Replication
(a) Pages distributed on 4 machines (b) CPU 0 reads page 10 (c) CPU 1 reads page 10
43
Consistency and f alse sharing in DSM
44
Strong memory consistency
P1 P2 P3 P4
W1 W2 W3 W4 R2 R1
Total order enf orces sequential consistency
- int uit ively simple f or programmers, but very cost ly t o
implement
- not even implement ed in non-dist ribut ed machines!
45
Scheduling in multi- computer systems
Each computer has its own OS
local scheduling applies
Which computer should we allocate a task to
initially?
Decision can be based on load (load balancing) load balancing can be st at ic or dynamic
46
Graph- theoretic load balancing approach
Process
- Two ways of allocating 9 processes to 3 nodes
- Total network traf f ic is sum of arcs cut by node
boundaries
- The second partitioning is better
47
Sender- initiated load balancing
- Overloaded nodes (senders) of f - load work to underloaded
nodes (receivers)
48
Receiver- initiated load balancing
- Underloaded nodes (receivers) request work f rom overloaded
nodes (senders)
Distributed Systems
50
Distributed systems
Def inition: Loosely- coupled CPUs that do not
share memory
wher e is t he boundar y bet ween t ight ly-coupled and
loosely-coupled syst ems?
Other dif f erences
single vs mult iple administ r at ive domains geogr aphic dist r ibut ion homogeneit y vs het er ogeneit y of har dwar e and
sof t war e
51
Comparing multiprocessors, multi- computers and distributed systems
52
Ethernet as an interconnect
Computer
- Bus- based vs switched Ethernet
53
The I nternet as an interconnect
54
OS issues f or distributed systems
Common interf aces above heterogeneous
systems
Communicat ion pr ot ocols Dist r ibut ed syst em middlewar e
Choosing suitable abstractions f or distributed
system interf aces
dist r ibut ed document -based syst ems dist r ibut ed f ile syst ems dist r ibut ed obj ect syst ems
55
Network service and protocol types
56
Protocol interaction and layering
57
Homogeneity via middleware
58
Distributed system middleware models
Document- based systems File- based systems Object- based systems
59
Document- based middleware - WWW
60
Document- based middleware
How the browser gets a page
Asks DNS f or I P address DNS replies with I P address Browser makes connection Sends request f or specif ied page Server sends f ile TCP connection released Browser displays text Browser f etches, displays images
61
File- based middleware
Design issues
Naming and name r esolut ion Ar chit ect ur e and int er f aces Caching st r at egies and cache consist ency File shar ing semant ics Disconnect ed oper at ion and f ault t oler ance
62
Naming
(b) Clients with the same view of name space (c) Clients with dif f erent views of name space
63
Naming and transparency issues
- Can clients distinguish between local and remote f iles?
- Location transparency
f ile name does not reveal t he f ile' s physical st orage
locat ion.
- Location independence
t he f ile name does not need t o be changed when t he
f ile' s physical st orage locat ion changes.
64
Global vs local name spaces
- Global name space
f ile names are globally unique any f ile can be named f rom any node
- Local name spaces
remot e f iles must be insert ed in t he local name space f ile names are only meaningf ul wit hin t he calling node but how do you ref er t o remot e f iles in order t o insert
t hem?
- globally unique f ile handles can be used to map remote
f iles to local names
65
Building a name space with super- root
- Super- root / machine name approach
concat enat e t he host name t o t he names of f iles st ored on
t hat host
syst em-wide uniqueness guarant eed simple t o locat ed a f ile not locat ion t ransparent or locat ion independent
66
Building a name space using mounting
Mounting remote f ile systems
export ed remot e direct ory is import ed and mount ed ont o
local direct ory
accesses require a globally unique f ile handle f or t he remot e
direct ory
- nce mount ed, f ile names are locat ion-t ransparent
- location can be captured via naming conventions
are t hey locat ion independent ?
- location of f ile vs location of client?
- f iles have dif f erent names f rom dif f erent places
67
Local name spaces with mounting
- Mounting (part of ) a remote f ile system in NFS.
68
Nested mounting on multiple servers
69
NSF name space
- Server exports a directory
- mountd: provides a unique f ile handle f or the exported
directory
- Client uses RPC to issue nfs_mount request to server
- mountd receives the request and checks whether
t he pat hname is a direct ory? t he direct ory is export ed t o t his client ?
70
NFS f ile handles
- V- node contains
- ref erence t o a f ile handle f or mount ed remot e f iles
- ref erence t o an i-node f or local f iles
- File handle uniquely names a remote directory
- f ile syst em ident if ier: unique number f or each f ile syst em (in UNI X
super block)
- i-node and i-node generat ion number
v-node i-node File handle File System identifier i-node i-node generation number
71
Mounting on- demand
- Need to decide where and when to mount remote
directories
- Where? - Can be based on conventions to standardize
local name spaces (ie., / home/ username f or user home directories)
- When? - boot time, login time, access time, …
?
- What to mount when?
How long does it t ake t o mount everyt hing? Do we know what everyt hing is? Can we do mount ing on-demand?
- An automounter is a client- side process that handles on-
demand mounting
it int ercept s request s and act s like a local NFS server
72
Distributed f ile system architectures
- Server side
how do servers export f iles how do servers handle request s f rom client s?
- Client side
how do applicat ions access a remot e f ile in t he same way
as a local f ile?
- Communication layer
how do client s and servers communicat e?
73
Local access architectures
- Local access approach
move f ile t o client local access on client ret urn f ile t o server dat a shipping
approach
74
Remote access architectures
- Remote access
leave f ile on server send read/ writ e operat ions
t o server
ret urn result s t o client f unct ion shipping approach
75
File- level interf ace
Accesses can be supported at either the f ile
granularity or block granularity
File- level client- server interf ace
local access model wit h whole f ile movement and
caching
r emot e access model client -ser ver int er f ace at
syst em call level
client per f or ms r emot e open, r ead, wr it e, close calls
76
Block- level interf ace
Block- level client- server interf ace
client -ser ver int er f ace at f ile syst em or disk block
level
ser ver of f er s vir t ual disk int er f ace client f ile accesses gener at e block access r equest s
t o ser ver
block-level caching of part s of f iles on client
77
NFS architecture
- The basic NFS architecture f or UNI X systems.
78
NFS server side
- Mountd
server export s direct ory via mount d mount d provides t he init ial f ile handle f or t he export ed
direct ory
client issues nfs_mount request via RP
C t o mount d
mount d checks if t he pat hname is a direct ory and if t he
direct ory is export ed t o t he client
- nf sd: services NFS RPC calls, gets the data f rom its
local f ile system, and replies to the RPC
Usually list ening at port 2049
- Both mountd and nf sd use RPC
79
Communication layer: NFS RPC Calls
- NFS / RPC uses XDR and TCP/ I P
- f handle: 64- byte opaque data (in NFS v3)
what ’s in t he f ile handle?
status, fattr fhandle, offset, count, data write status, fhandle, fattr dirfh, name, fattr create status, fattr, data fhandle, offset, count read status, fhandle, fattr dirfh, name lookup Results Input args Proc.
80
NFS f ile handles
- V- node contains
- ref erence t o a f ile handle f or mount ed remot e f iles
- ref erence t o an i-node f or local f iles
- File handle uniquely names a remote directory
- f ile syst em ident if ier: unique number f or each f ile syst em (in UNI X
super block)
- i-node and i-node generat ion number
v-node i-node File handle File System identifier i-node i-node generation number
81
NFS client side
Accessing remote f iles in the same way as
accessing local f iles requires kernel support
Vnode int er f ace
read(fd,..) struct file
Mode Vnode
- ffset
V_data
fs_op struct vnode
{int (*open)(); int (*close)(); int (*read)(); int (*write)(); int (*lookup)(); … } process file table
82
Caching vs pure remote service
- Network traf f ic?
–
caching reduces remot e accesses ⇒ reduces net work t raf f ic
–
caching generat es f ewer, larger, dat a t ransf ers
- Server load?
–
caching reduces remot e accesses ⇒ r educes ser ver load
- Server disk throughput?
–
- pt imized bet t er f or large request s t han random disk blocks
- Data integrity?
–
cache-consist ency problem due t o f requent writ es
- Operating system complexity?
–
simpler f or remot e service.
83
Four places to cache f iles
Server’s disk: slow perf ormance Server’s memory
cache management , how much t o cache, r eplacement
st r at egy
st ill slow due t o net wor k delay
Client’s disk
access speed vs ser ver memor y? lar ge f iles can be cached suppor t s disconnect ed oper at ion
Client’s memory
f ast est access can be used by diskless wor kst at ions compet es wit h t he VM syst em f or physical memor y
space
84
Cache consistency
Ref lect ing changes t o local cache t o mast er copy Ref lect ing changes t o mast er copy t o local caches
update/invalidate Copy 1 Copy 2 Master copy write
85
Common update algorithms f or client caching
- Write- through: all writes are carried out immediately
- Reliable: lit t le inf ormat ion is lost in t he event of a client crash
- Slow: cache not usef ul f or writ es
- Delayed- write: writes do not immediately propagate to server
- bat ching writ es amort izes overhead
- wait f or blocks t o f ill
- if dat a is writ t en and t hen delet ed immediat ely, dat a need not
be writ t en at all (20-30 % of new dat a is delet ed wit h 30 secs)
- Write- on- close: delay writing until the f ile is closed at the
client
- semant ically meaningf ul delayed-writ e policy
- if f ile is open f or short durat ion, works f ine
- if f ile is open f or long, suscept ible t o losing dat a in t he event of
client crash
86
Cache coherence
- How to keep locally cached data up to date / consistent?
- Client- initiated approach
check validit y on every access: t oo much overhead f irst access t o a f ile (e.g., f ile open) every f ixed t ime int erval
- Server- initiated approach
server records, f or each client , t he (part s of ) f iles it
caches
server responds t o updat es by propagat ion or invalidat ion
- Disallow caching during concurrent- write or read/ write
sharing
allow mult iple client s t o cache f ile f or read only access f lush all client caches when t he f ile is opened f or writ ing
87
NFS – server caching
Reads
use t he local f ile syst em cache pref et ching in UNI X using read-ahead
Writes
writ e-t hrough (synchronously, no cache) commit on close (st andard behaviour in v4)
88
NFS – client caching (reads)
- Clients are responsible f or validating cache entries
(stateless server)
- Validation by checking last modif ication time
t ime st amps issues by server aut omat ic validat ion on open (wit h server??)
- A cache entry is considered valid if one of the f ollowing
are true:
cache ent ry is less t han t seconds old (3-30 s f or f iles,
30-60 s f or direct ories)
modif ied t ime at server is t he same as modif ied t ime on
client
89
NFS – client caching (writes)
- Delayed writes
modif ied f iles are marked dirt y and f lushed t o server on
close (or sync)
- Bio- daemons (block input- output)
read-ahead request s are done asynchronously writ e request s are submit t ed when a block is f illed
90
File sharing semantics
- Semantics of File sharing
(a) single processor gives sequent ial consist ency (b) dist ribut ed syst em may ret urn obsolet e value
91
Consistency semantics f or f ile sharing
- What value do reads see af ter writes?
- UNI X semantics
- value read is t he value st ored by last writ e
- writ es t o an open f ile are visible immediat ely t o ot hers wit h t he
f ile open
- easy t o implement wit h one server and no cache
- Session semantics
- writ es t o an open f ile are not visible immediat ely t o ot hers wit h
t he f ile opened already
- changes become visible on close t o sessions st art ed lat er
- I mmutable- Shared- Files semantics - simple to implement
- A sharable f ile cannot be modif ied
- File names cannot be reused and it s cont ent s may not be
alt ered
- Transactions
- All changes have all-or-not hing propert y
- W1,R1,R2,W2 not allowed where P1 = W1;W2 and P2 = R1;R2
92
NFS – f ile sharing semantics
- Not UNI X semantics!
- Unspecif ied in NFS standard
- Not clear because of timing dependencies
- Consistency issues can arise
Example: J ack and J ill have a f ile cached. J ack opens t he
f ile and modif ies it , t hen he closes t he f ile. J ill t hen
- pens t he f ile (bef ore t seconds have elapsed) and
modif ies it as well. Then she closes t he f ile. Are bot h J ack’s and J ill’s modif icat ions present in t he f ile? What if J ack closes t he f ile af t er J ill opens it ?
- Locking part of v4 (byte range, leasing)