overview
play

Overview Limitations of lock-based programming Transactional - PowerPoint PPT Presentation

Overview Limitations of lock-based programming Transactional memory Programming with TM 6 Transactional Memory Software TM (STM) Hardware TM (HTM) Chip Multiprocessors (ACS MPhil) Robert Mullins Chip


  1. Overview • Limitations of lock-based programming • Transactional memory – Programming with TM 6 • Transactional Memory – Software TM (STM)‐ – Hardware TM (HTM)‐ Chip Multiprocessors (ACS MPhil)‐ Robert Mullins Chip Multiprocessors (ACS MPhil)‐ 2 Lock-based programming Lock-based programming • Lock-based programming is a low-level model • Challenges: – Close to basic hardware primitives – Must remember to use (the correct)‐ locks – For some problems lock-based solutions that perform • Careful to avoid when not required (for performance)‐ well are complex and error-prone – Coarse-grain vs. fine-grain locks • difficult to write, debug, and maintain • Simplicity • Not true of all problems • Unnecessary serialisation of operations • Parallel programming for the masses – Lock may not actually be required in most cases (data dependent)‐. Lock-based programming may be pessimistic. – The majority of programmers will need to be able to • We must also consider the time taken to acquire and release produce highly parallel and robust software locks! (even uncontended locks have a cost)‐ – What is the optimal granularity of locking? HW dependent. Chip Multiprocessors (ACS MPhil)‐ 3 Chip Multiprocessors (ACS MPhil)‐ 4

  2. Lock-based programming Lost wake-up example • Other issues: push mutex::scoped_lock lock (pushMutex) – Deadlock queue.push(item) – Scheduling threads if (queue.size()==1) m_emptyCond.notify_one() • Priority inversion (e.g. Mars Rover Pathfinder problems)‐ – Low-priority thread is preempted (while holding a lock)‐ pop – Medium-priority thread runs // (implicit lock release when leaving scope) – High-priority thread (needing the lock)‐ can't make progress mutex::scoped_lock lock (popMutex) • Convoying while (queue.empty()) – Thread holding lock is descheduled, a queue of threads form m_emptyCond.wait() – lost wake-ups (wait on CV, but forget to signal)‐ Item = queue.front() – Horribly complicated error recovery queue.pop() – Cannot even easily compose lock based programs return item Chip Multiprocessors (ACS MPhil)‐ 5 Chip Multiprocessors (ACS MPhil)‐ 6 Lock-based programming Lock-based programming • Avoiding deadlock // Trivial deadlock example // Thread 1 // Thread 2 – Requires programmer to adopt some sort of policy (although this a.lock(); b.lock(); isn't automatically enforced)‐ b.lock(); a.lock(); – Often difficult to maintain/understand ... ... • Lock hierarchies • Deadlock – All code must take locks in the same order – We are free to do anything when we hold a lock, even – Lock chaining – take first lock, take second, release first, etc. take a lock on another mutex • Try and back off – This can quickly lead to deadlock if we are not careful – More flexible than imposing a fixed order – Get first lock • Limiting ourselves to only being able to take a single lock at a time would force us to use coarse-grain locks – Then try and lock additional mutexes in the required set. If we fail release locks and retry • e.g. consider maintaining two queues. These are each accessed by many different threads. We are infrequently • pthread_mutex_trylock required to transfer data from one queue to the other (atomically)‐ Chip Multiprocessors (ACS MPhil)‐ 7 Chip Multiprocessors (ACS MPhil)‐ 8

  3. Lock-based programming Transactions • Composing lock-based programs atomic { x=q0.deq(); – Consider our example of two queues q1.enq(x); – There is no simple way of dequeuing from one and } enqueuing to the other in an atomic fashion • We would need to expose synchronization state and force • Focus on where atomicity is necessary rather than caller to manage locks specific locking mechanisms – Can't compose methods that block either (wait/notify)‐ • The transactional memory system will ensure that the • How do we describe the operation where we want to dequeue from either queue, whichever has data transaction is run in isolation from other threads • Each queue implementation blocks internally – Transactions are typically run in parallel optimistically – If transactions perform conflicting memory accesses, we must abort and ensure none of the side-effects of the abandoned transactions are visible Chip Multiprocessors (ACS MPhil)‐ 9 Chip Multiprocessors (ACS MPhil)‐ 10 Transactions Transactions • Atomicity (all-or-nothing)‐ void Queue::enq (int v) { – We guarantee that it appears that either all the atomic { instructions are executed or none of them are (if the // queue is full transaction fails, failure atomicity )‐ if (count==MAX_LEN) retry ; buf[tail]=v; – The transaction either commits or aborts if (++tail == MAX_LEN) tail=0; count++; • Transactions execute in isolation } } – Other operations cannot access a transaction's • Retry intermediate state. – Abandon transaction and try again – The result of executing concurrent transactions must – An implementation could wait until some changes be identical to a result in which the transactions occur in memory locations read by the aborted executed sequentially ( serializability )‐ transaction • Or specify a specific watch set [Atomos/PLDI'06] “ Composable memory transactions ”, Harris et al. Chip Multiprocessors (ACS MPhil)‐ 11 Chip Multiprocessors (ACS MPhil)‐ 12

  4. Transactions Critical sections ≠ transactions • Converting critical sections to transactions – pitfall: “ A critical section that was previously atomic atomic { only with respect to other critical sections guarded by x = q0.deq(); the same lock is now atomic with respect to all other } orElse { critical sections. ” x = q1.deq(); } proc1 { proc2 { acquire (m1) acquire (m2) • Choice while (!flagA) {} flag A=true flagB = true while (!flagB) {} – Try to dequeue from q0 first, if this retries .... .... release(m1) release(m2) (i.e. queue is empty)‐, then try the second } } – If both retry, retry the whole orElse block “ Deconstructing Transactional Semantics: The Subtleties of Atomicity ” Colin Blundell. E Christopher Lewis. Milo M. K. Martin,WDDD, 2005)‐ “ Composable memory transactions ”, Harris et al. Chip Multiprocessors (ACS MPhil)‐ 13 Chip Multiprocessors (ACS MPhil)‐ 14 Implementating a TM system Hardware support for TM • Transaction granularity • An introduction to hardware mechanisms for supporting transactional memory – Object, word or block – See Larus/Rajwar book for a more complete survey • How do we provide isolation? – We'll look at: – Direct or deferred update ? • Knight, “An architecture for mostly functional languages”, in • Update object directly and keep undo log LFP, 1986. • Update private copy, discard or replace object • A simple HTM with lazy conflict detection – Also called eager and lazy versioning • Herlihy/Moss (1993)‐ • When and how do we detect conflicts? – Discuss others in reading group – Eager or lazy conflict detection ? • A software or hardware-supported implementation? Chip Multiprocessors (ACS MPhil)‐ 15 Chip Multiprocessors (ACS MPhil)‐ 16

  5. Hardware support for TM Hardware support for TM • 1. Tom Knight (1986)‐ – Not really a TM scheme , Knight describes a scheme for parallelising the execution of a single thread – Blocks are identified by the compiler and executed in parallel assuming there are no memory carried dependencies between them – Hardware support is provided to detect memory dependency violations – This work introduces the basic ideas of using caches and the cache coherence protocol to support TM Larus/Rajwar book p.140 [Knight86] Chip Multiprocessors (ACS MPhil)‐ 17 Chip Multiprocessors (ACS MPhil)‐ 18 Hardware support for TM Hardware support for TM • Confirm Cache • Dependency Cache – A block executes to completion and then commits. – The dependency cache holds data read from memory. Blocks are committed in the original program order Data read during a block is held in state D (Depends)‐ • Any data written in the block is temporarily held in the confirm • A memory dependency violation is detected if a bus write cache (not visible to other processors)‐. This is swept and (made by a block that is currently committing)‐ updates a written back during commit. value in a dependency cache in state D • On a processor read, priority is given to the data in the • This indicates that the block read the data too early and must commit cache be aborted – The block needs to see any writes it has made [Knight86] [Knight86] Chip Multiprocessors (ACS MPhil)‐ 19 Chip Multiprocessors (ACS MPhil)‐ 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend