now handout page 1
play

NOW Handout Page 1 Strawman Lock Atomic Instructions Specifies a - PDF document

Role of Synchronization A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Hardware-Software Trade-offs in Synchronization Types of Synchronization Mutual


  1. Role of Synchronization • “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.” Hardware-Software Trade-offs in Synchronization • Types of Synchronization – Mutual Exclusion – Event synchronization CS 252, Spring 05 » point-to-point » group David E. Culler » global (barriers) Computer Science Division • How much hardware support? U.C. Berkeley – high-level operations? – atomic instructions? – specialized interconnect? 3/29/2005 CS252 S05 2 Layers of synch support Mini-Instruction Set debate • atomic read-modify-write instructions – IBM 370: included atomic compare&swap for multiprogramming Application – x86: any instruction can be prefixed with a lock modifier – High-level language advocates want hardware locks/barriers » but it’s goes against the “RISC” flow,and has other User library problems Operating System Support – SPARC: atomic register-memory ops (swap, compare&swap) – MIPS, IBM Power: no atomic operations but pair of Synchronization Library instructions » load-locked, store-conditional Atomic RMW ops » later used by PowerPC and DEC Alpha too HW Support • Rich set of tradeoffs 3/29/2005 CS252 S05 3 3/29/2005 CS252 S05 4 Other forms of hardware support Components of a Synchronization Event • Separate lock lines on the bus • Acquire method • Lock locations in memory – Acquire right to the synch » enter critical section, go past event • Lock registers (Cray Xmp) • Waiting algorithm • Hardware full/empty bits (Tera) – Wait for synch to become available when it isn’t • Bus support for interrupt dispatch – busy-waiting, blocking, or hybrid • Release method – Enable other processors to acquire right to the synch • Waiting algorithm is independent of type of synchronization – makes no sense to put in hardware 3/29/2005 CS252 S05 5 3/29/2005 CS252 S05 6 NOW Handout Page 1

  2. Strawman Lock Atomic Instructions • Specifies a location, register, & atomic operation Busy-Wait – Value in location read into a register /* copy location to register */ lock: ld register, location – Another value (function of value read or not) stored into /* compare with 0 */ cmp location, #0 location /* if not 0, try again */ bnz lock • Many variants /* store 1 to mark it locked */ st location, #1 /* return control to caller */ ret – Varying degrees of flexibility in second part • Simple example: test&set /* write 0 to location */ unlock: st location, #0 – Value in location read into a specified register /* return control to caller */ ret – Constant 1 stored into location – Successful if value loaded into register is 0 Why doesn’t the acquire method work? – Other constants could be used instead of 1 and 0 Release method? 3/29/2005 CS252 S05 7 3/29/2005 CS252 S05 8 Simple Test&Set Lock Performance Criteria for Synch. Ops • Latency (time per op) lock: t&s register, location /* if not 0, try again */ bnz lock – especially when light contention /* return control to caller */ ret • Bandwidth (ops per sec) /* write 0 to location */ unlock: st location, #0 – especially under high contention /* return control to caller */ ret • Traffic • Other read-modify-write primitives ? e c n – load on critical resources a m – Swap, Exch r o f r e – especially on failures under contention p n – Fetch&op o i t a z • Storage i n o – Compare&swap r h c n y s » Three operands: location, register to compare with, e r u s a ? register to swap with e n m o • Fairness i u a t o u r y D » Not commonly supported by RISC instruction sets o d ? e s l n a o c S t i d i • cacheable or uncacheable ? n n o c o t i t n a e h w n t o e r C d n U • 3/29/2005 CS252 S05 9 3/29/2005 CS252 S05 10 T&S Lock Microbenchmark: SGI Chal. Enhancements to Simple Lock 20 � � Test&set, c = 0 • Reduce frequency of issuing test&sets while � Test&set, exponential backof f, c = 3.64 � 18 � Test&set, exponential backof f, c = 0 � � waiting � Ideal � 16 � � � – Test&set lock with backoff � 14 � � � � – Don’t back off too much or will be backed off when lock � � 12 � Time ( µ s) � � becomes free � � � � – Exponential backoff works quite well empirically: i th time = 10 � � � � � � k*c i 8 lock; � � • Busy-wait with read operations rather than 6 � delay(c); test&set � � 4 unlock; � � � � � – Test-and-test&set lock 2 � � � ��������������� � � � � – Keep testing with ordinary load � � � � 0 3 5 7 9 11 13 15 » cached lock variable will be invalidated when release Number of processors • Why does performance degrade? occurs – When value changes (to 0), try to obtain lock with test&set • Bus Transactions on T&S? » only one attemptor will succeed; others will fail and start • Hardware support in CC protocol? testing again 3/29/2005 CS252 S05 11 3/29/2005 CS252 S05 12 NOW Handout Page 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend