Multipr cess r/Multic re Systems Multiprocessor/Multicore Systems - PowerPoint PPT Presentation

Multipr cess r/Multic re Systems Multiprocessor/Multicore Systems Scheduling, Synchronization, cont

Recall: Multiprocessor Scheduling: a problem Multiprocessor Scheduling: a problem • Problem with communication between two threads P bl m ith mm ni ti n b t n t th ds – both belong to process A – both running out of phase g f p • Scheduling and synchronization inter-related in 2 multiprocessors

Multiprocessor Scheduling and Synchronization p g y Priorities + locks may result in: y priority inversion: low-priority process P holds a lock, high-priority process waits, medium priority p processes do not allow P to complete and release the ss s d n t ll P t mpl t nd l s th lock fast (scheduling less efficient). To cope/avoid this: – use priority inheritance – Avoid locks in synchronization (wait-free, lock-free, optimistic synchronization) optimistic synchronization) convoy effect : processes need a resource for short time, the process holding it may block them for long p g y g time (hence, poor utilization) – Avoiding locks is good here, too 3

Readers Writers and Readers-Writers and non-blocking synchronization (some slides are adapted from J. Anderson’s slides on same topic) 4

The Mutual Exclusion Problem Locking Synchronization while true do hil t d • N processes, each Noncritical Section; Entry Section; with this structure: ith this st t : Critical Section; Exit Section od • Basic Requirements: Basic Requirements: – Exclusion: Invariant(# in CS ≤ 1). – Starvation-freedom: (process i in Entry) leads-to St ti f d : ( ss i i E t ) l ds t (process i in CS). • Can implement by “busy waiting” (spin locks) or using kernel calls. 5

Synchronization without locks y • The problem: – Implement a shared object without mutual mp m nt a har j ct w th ut mutua exclusion . Locking • Shared Object: A data structure ( e.g ., queue) shared by concurrent processes. b – Why? • To avoid performance problems that result when a T v id p rf rm nc pr bl ms th t r sult h n lock-holding task is delayed. • To enable more interleaving (enhancing parallelism) g g p • To avoid priority inversions 6

Synchronization without locks y • Two variants: – Lock-free: L c fr • system-wide progress is guaranteed. • Usually implemented using “retry loops.” – Wait-free: • Individual progress is guaranteed. • More involved algorithmic methods l d l h h d 7

Readers/Writers Problem Readers/Writers Problem [Courtois et al 1971 ] [Courtois, et al. 1971.] • Similar to mutual exclusion, but several readers can execute critical section at once. t iti l s ti t • If a writer is in its critical section, then no other process can be in its critical section. • + no starvation, fairness 8

Solution 1 Readers have “priority”… Reader:: Writer:: P( mutex ); P( w ); ( ); rc := rc + 1; CS; if rc = 1 then P( w ) fi; V( w ) V( V( mutex ); t ) CS; P( mutex ); P( mutex ); rc := rc − 1; if rc = 0 then V( w ) fi; V( mutex ) “First” reader executes P(w). “Last” one executes V(w). 9

Concurrent Reading and Writing [L [Lamport ‘77] t ‘77] • Previous solutions to the readers/writers Previous solutions to the readers/writers problem use some form of mutual exclusion. • Lamport considers solutions in which readers and writers access a shared object j concurrently . • Motivation: M ti ti – Don’t want writers to wait for readers. – Readers/writers solution may be needed to / l implement mutual exclusion (circularity problem). 10

Interesting Factoids g • This is the first ever lock-free algorithm: This is the first ever lock free algorithm: guarantees consistency without locks • An algorithm very similar to this is implemented within an embedded controller in Mercedes ithi b dd d t ll i M d automobiles 11

The Problem • Let v be a data item, consisting of one or more digits. – For example, v = 256 consists of three digits, “2”, “5”, and “6”. • Underlying model: Digits can be read and Underlying model: Digits can be read and written atomically. • Objective: Simulate atomic reads and writes of the data item v the data item v . 12

Preliminaries • Definition: v [ i ] , where i ≥ 0, denotes the i th value written to v. (v [0] is v ’s initial value.) • Note: No concurrent writing of v • Note: No concurrent writing of v . • Partitioning of v : v 1 L v m . g 1 m – v i may consist of multiple digits. • To read v : Read each v (in some order) • To read v : Read each v i (in some order). • To write v : Write each v i (in some order). i ( ) 13

More Preliminaries read r: L read v m -1 read read v 1 read v 3 read v m � � � v 2 � � write: k write: k + i write: l L L We say: r reads v [ k,l ] . Value is consistent if k = l . 14

Main Theorem Assume that i ≤ j implies that v [ ] ≤ v [ j ] , where v = d 1 … d m . Assume that i ≤ j implies that v [ i ] ≤ v [ j ] where v = d d (a) If v is always written from right to left, then a read from left to ( ) y g , right obtains a value v [ k , l ] ≤ v [ l ] . (b) If v is always written from left to right, then a read from right to (b) If i l i f l f i h h d f i h left obtains a value v [ k , l ] ≥ v [ k ] . 15

Readers/Writers Solution Writer:: Reader:: → → → → V1 :> V1; repeat temp := V2 write D; read D ← ← ← V2 := V1 until V1 = temp :> means assign larger value. → V1 means “left to right”. ← V2 means “right to left”. 16

Useful Synchronization Primitives y Usually Necessary in Nonblocking Algorithms CAS(var, old, new) ( , , ) CAS2 CAS2 〈 if var ≠ old then return false fi; extends var := new; this return true 〉 return true 〉 LL(var) 〈 establish “link” to var; 〈 establish link to var; return var 〉 SC(var, val) ( , ) 〈 if “link” to var still exists then break all current links of all processes; var := val; var : val; return true else return false return false fi 〉 17

Another Lock-free Example n r L fr E amp Shared Queue type Qtype = record v: valtype; next: pointer to Qtype end shared var Tail: pointer to Qtype; local var old new: pointer to Qtype local var old, new: pointer to Qtype procedure Enqueue (input: valtype) new := (input, NIL); new := (input NIL); repeat old := Tail retry loop until CAS2(Tail, old->next, old, NIL, new, new) ne new new old ld old ld Tail Tail 18

Cache-coherence cache coherency protocols are based on a set of (cache block) states and state transitions : 2 main types of protocols • write-update • write-invalidate write invalidate • Reminds R i d readers/writers? 19

Multiprocessor architectures, memory consistency nsist n • Memory access protocols and cache coherence protocols define memory consistency models • Examples: p – Sequential consistency: SGI Origin (more and more seldom found now...) – Weak consistency: sequential consistency for special synchronization variables and actions before/after access to such variables. No ordering of other actions. SPARC architectures – ..... 20

Distributed OS issues: IPC: Client/Server, RPC mechanisms Clusters load balncing Middleware Clusters, load balncing, Middleware

Multicomputers p • Definition: • Definition: Tightly-coupled CPUs that do not share memory • Also known as – cluster computers clust c mput s – clusters of workstations (COWs) – illusion is one machine – Alternative to symmetric multiprocessing (SMP) Alt ti t t i lti i (SMP) 22

Clusters Benefits of Clusters • Scalability – Can have dozens of machines each of which is a multiprocessor – Add new systems in small increments y • Availability – Failure of one node does not mean loss of service (well, not necessarily at least… why?) necessarily at least… why?) • Superior price/performance – Cluster can offer equal or greater computing power than a single large machine at a much lower cost large machine at a much lower cost BUT: • think about communication!!! • Th b The above picture is changing with multicore systems i i h i i h l i 23

Multicomputer Hardware example p p Network interface boards in a multicomputer Network interface boards in a multicomputer 24

Clusters: Op Operating System Design Issues tin S st m D si n Iss s Failure management • offers a high probability that all resources will be in service • Fault-tolerant cluster ensures that all resources are always available (replication needed) available (replication needed) Load balancing • When new computer added to the cluster, automatically include this p , y computer in scheduling applications Parallelism • parallelizing compiler or application e.g. beowulf, linux clusters 25

Cluster Computer Architecture p • Network • Middl Middleware layer to provide l t id – single-system image – fault-tolerance, load balancing, parallelism , g, p 26

IPC • Client-Server Computing • Remote Procedure Calls • P2P collaboration (related to overlays, cf. advanced networks and distr. Sys course) k d d ) • Distributed shared memory ( cf. advanced distr. Sys course) 27

Multipr cess r/Multic re Systems Multiprocessor/Multicore Systems - PowerPoint PPT Presentation

Multipr cess r/Multic re Systems Multiprocessor/Multicore Systems Scheduling, Synchronization, cont Recall: Multiprocessor Scheduling: a problem Multiprocessor Scheduling: a problem Problem with communication between two threads P bl m

Multipr cess r/Multic re Systems Multiprocessor/Multicore Systems Scheduling, Synchronization,

Multiprocessor Synchronization Multiprocessor Systems Memory Consistency

Multiprocessor Synchronization Multiprocessor Systems Memory Consistency In addition,

Multiple processor Multiple processor systems systems 1 Multiprocessor Systems Multiprocessor

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

Multiprocessor Scheduling Will consider only shared memory multiprocessor Salient features:

The Diopsis Multiprocessor Tile of ShApes The Diopsis Multiprocessor Tile of ShApes Pier

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

Multicore and Multiprocessor Systems: Part IV Jens Saak Scientific Computing II 141/348 Tree

Dispatching Domains for Multiprocessor Platforms and their Representation in Ada Alan Burns and

Debugging Multicore & Shared- Memory Embedded Systems Classes 249 & 269 2007 edition

When Multicore Isnt Enough: Trends and the Future for Multi-Multicore Systems Matt Reilly

Shape Modelling Aquisition Reconstruction Processing 17-06-2009 Workshop on

Musk explains why SpaceX prefers clusters of small engines " Its sort of like the way

Cloud Computing John McSpedon Why Parallel Computation? Traditional Moores Law

1 Classification by Control Structure Classification by Memory Organization e.g.

Parallel Architectures Frdric Desprez INRIA F. Desprez - UE Parallel alg. and prog.

The Integrative Role of COWs and Supercomputers in Research and Education Activities Don

Search for Gluonic Excitations in Hadrons with GlueX Hadron 2011 Igor Senderovich June 16, 2011

FAI The Universal Deployment Tool Thomas Lange, University of Cologne

Sambuz

Useful Links

Newsletter

Mail Us

Multipr cess r/Multic re Systems Multiprocessor/Multicore Systems - PowerPoint PPT Presentation

Multipr cess r/Multic re Systems Multiprocessor/Multicore Systems Scheduling, Synchronization, cont Recall: Multiprocessor Scheduling: a problem Multiprocessor Scheduling: a problem Problem with communication between two threads P bl m

Multipr cess r/Multic re Systems Multiprocessor/Multicore Systems Scheduling, Synchronization,

Multiprocessor Synchronization Multiprocessor Systems Memory Consistency

Multiprocessor Synchronization Multiprocessor Systems Memory Consistency In addition,

Multiple processor Multiple processor systems systems 1 Multiprocessor Systems Multiprocessor

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

Multiprocessor Scheduling Will consider only shared memory multiprocessor Salient features:

The Diopsis Multiprocessor Tile of ShApes The Diopsis Multiprocessor Tile of ShApes Pier

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

Multicore and Multiprocessor Systems: Part IV Jens Saak Scientific Computing II 141/348 Tree

Dispatching Domains for Multiprocessor Platforms and their Representation in Ada Alan Burns and

Debugging Multicore &amp; Shared- Memory Embedded Systems Classes 249 &amp; 269 2007 edition

When Multicore Isnt Enough: Trends and the Future for Multi-Multicore Systems Matt Reilly

Shape Modelling Aquisition Reconstruction Processing 17-06-2009 Workshop on

Musk explains why SpaceX prefers clusters of small engines &quot; Its sort of like the way

Cloud Computing John McSpedon Why Parallel Computation? Traditional Moores Law

1 Classification by Control Structure Classification by Memory Organization e.g.

Parallel Architectures Frdric Desprez INRIA F. Desprez - UE Parallel alg. and prog.

The Integrative Role of COWs and Supercomputers in Research and Education Activities Don

Search for Gluonic Excitations in Hadrons with GlueX Hadron 2011 Igor Senderovich June 16, 2011

FAI The Universal Deployment Tool Thomas Lange, University of Cologne

Sambuz

Useful Links

Newsletter

Mail Us

Debugging Multicore & Shared- Memory Embedded Systems Classes 249 & 269 2007 edition

Musk explains why SpaceX prefers clusters of small engines " Its sort of like the way