Overview Limitations of lock-based programming Transactional - PowerPoint PPT Presentation

Overview • Limitations of lock-based programming • Transactional memory – Programming with TM 6 • Transactional Memory – Software TM (STM)‐ – Hardware TM (HTM)‐ Chip Multiprocessors (ACS MPhil)‐ Robert Mullins Chip Multiprocessors (ACS MPhil)‐ 2 Lock-based programming Lock-based programming • Lock-based programming is a low-level model • Challenges: – Close to basic hardware primitives – Must remember to use (the correct)‐ locks – For some problems lock-based solutions that perform • Careful to avoid when not required (for performance)‐ well are complex and error-prone – Coarse-grain vs. fine-grain locks • difficult to write, debug, and maintain • Simplicity • Not true of all problems • Unnecessary serialisation of operations • Parallel programming for the masses – Lock may not actually be required in most cases (data dependent)‐. Lock-based programming may be pessimistic. – The majority of programmers will need to be able to • We must also consider the time taken to acquire and release produce highly parallel and robust software locks! (even uncontended locks have a cost)‐ – What is the optimal granularity of locking? HW dependent. Chip Multiprocessors (ACS MPhil)‐ 3 Chip Multiprocessors (ACS MPhil)‐ 4

Lock-based programming Lost wake-up example • Other issues: push mutex::scoped_lock lock (pushMutex) – Deadlock queue.push(item) – Scheduling threads if (queue.size()==1) m_emptyCond.notify_one() • Priority inversion (e.g. Mars Rover Pathfinder problems)‐ – Low-priority thread is preempted (while holding a lock)‐ pop – Medium-priority thread runs // (implicit lock release when leaving scope) – High-priority thread (needing the lock)‐ can't make progress mutex::scoped_lock lock (popMutex) • Convoying while (queue.empty()) – Thread holding lock is descheduled, a queue of threads form m_emptyCond.wait() – lost wake-ups (wait on CV, but forget to signal)‐ Item = queue.front() – Horribly complicated error recovery queue.pop() – Cannot even easily compose lock based programs return item Chip Multiprocessors (ACS MPhil)‐ 5 Chip Multiprocessors (ACS MPhil)‐ 6 Lock-based programming Lock-based programming • Avoiding deadlock // Trivial deadlock example // Thread 1 // Thread 2 – Requires programmer to adopt some sort of policy (although this a.lock(); b.lock(); isn't automatically enforced)‐ b.lock(); a.lock(); – Often difficult to maintain/understand ... ... • Lock hierarchies • Deadlock – All code must take locks in the same order – We are free to do anything when we hold a lock, even – Lock chaining – take first lock, take second, release first, etc. take a lock on another mutex • Try and back off – This can quickly lead to deadlock if we are not careful – More flexible than imposing a fixed order – Get first lock • Limiting ourselves to only being able to take a single lock at a time would force us to use coarse-grain locks – Then try and lock additional mutexes in the required set. If we fail release locks and retry • e.g. consider maintaining two queues. These are each accessed by many different threads. We are infrequently • pthread_mutex_trylock required to transfer data from one queue to the other (atomically)‐ Chip Multiprocessors (ACS MPhil)‐ 7 Chip Multiprocessors (ACS MPhil)‐ 8

Lock-based programming Transactions • Composing lock-based programs atomic { x=q0.deq(); – Consider our example of two queues q1.enq(x); – There is no simple way of dequeuing from one and } enqueuing to the other in an atomic fashion • We would need to expose synchronization state and force • Focus on where atomicity is necessary rather than caller to manage locks specific locking mechanisms – Can't compose methods that block either (wait/notify)‐ • The transactional memory system will ensure that the • How do we describe the operation where we want to dequeue from either queue, whichever has data transaction is run in isolation from other threads • Each queue implementation blocks internally – Transactions are typically run in parallel optimistically – If transactions perform conflicting memory accesses, we must abort and ensure none of the side-effects of the abandoned transactions are visible Chip Multiprocessors (ACS MPhil)‐ 9 Chip Multiprocessors (ACS MPhil)‐ 10 Transactions Transactions • Atomicity (all-or-nothing)‐ void Queue::enq (int v) { – We guarantee that it appears that either all the atomic { instructions are executed or none of them are (if the // queue is full transaction fails, failure atomicity )‐ if (count==MAX_LEN) retry ; buf[tail]=v; – The transaction either commits or aborts if (++tail == MAX_LEN) tail=0; count++; • Transactions execute in isolation } } – Other operations cannot access a transaction's • Retry intermediate state. – Abandon transaction and try again – The result of executing concurrent transactions must – An implementation could wait until some changes be identical to a result in which the transactions occur in memory locations read by the aborted executed sequentially ( serializability )‐ transaction • Or specify a specific watch set [Atomos/PLDI'06] “ Composable memory transactions ”, Harris et al. Chip Multiprocessors (ACS MPhil)‐ 11 Chip Multiprocessors (ACS MPhil)‐ 12

Transactions Critical sections ≠ transactions • Converting critical sections to transactions – pitfall: “ A critical section that was previously atomic atomic { only with respect to other critical sections guarded by x = q0.deq(); the same lock is now atomic with respect to all other } orElse { critical sections. ” x = q1.deq(); } proc1 { proc2 { acquire (m1) acquire (m2) • Choice while (!flagA) {} flag A=true flagB = true while (!flagB) {} – Try to dequeue from q0 first, if this retries .... .... release(m1) release(m2) (i.e. queue is empty)‐, then try the second } } – If both retry, retry the whole orElse block “ Deconstructing Transactional Semantics: The Subtleties of Atomicity ” Colin Blundell. E Christopher Lewis. Milo M. K. Martin,WDDD, 2005)‐ “ Composable memory transactions ”, Harris et al. Chip Multiprocessors (ACS MPhil)‐ 13 Chip Multiprocessors (ACS MPhil)‐ 14 Implementating a TM system Hardware support for TM • Transaction granularity • An introduction to hardware mechanisms for supporting transactional memory – Object, word or block – See Larus/Rajwar book for a more complete survey • How do we provide isolation? – We'll look at: – Direct or deferred update ? • Knight, “An architecture for mostly functional languages”, in • Update object directly and keep undo log LFP, 1986. • Update private copy, discard or replace object • A simple HTM with lazy conflict detection – Also called eager and lazy versioning • Herlihy/Moss (1993)‐ • When and how do we detect conflicts? – Discuss others in reading group – Eager or lazy conflict detection ? • A software or hardware-supported implementation? Chip Multiprocessors (ACS MPhil)‐ 15 Chip Multiprocessors (ACS MPhil)‐ 16

Hardware support for TM Hardware support for TM • 1. Tom Knight (1986)‐ – Not really a TM scheme , Knight describes a scheme for parallelising the execution of a single thread – Blocks are identified by the compiler and executed in parallel assuming there are no memory carried dependencies between them – Hardware support is provided to detect memory dependency violations – This work introduces the basic ideas of using caches and the cache coherence protocol to support TM Larus/Rajwar book p.140 [Knight86] Chip Multiprocessors (ACS MPhil)‐ 17 Chip Multiprocessors (ACS MPhil)‐ 18 Hardware support for TM Hardware support for TM • Confirm Cache • Dependency Cache – A block executes to completion and then commits. – The dependency cache holds data read from memory. Blocks are committed in the original program order Data read during a block is held in state D (Depends)‐ • Any data written in the block is temporarily held in the confirm • A memory dependency violation is detected if a bus write cache (not visible to other processors)‐. This is swept and (made by a block that is currently committing)‐ updates a written back during commit. value in a dependency cache in state D • On a processor read, priority is given to the data in the • This indicates that the block read the data too early and must commit cache be aborted – The block needs to see any writes it has made [Knight86] [Knight86] Chip Multiprocessors (ACS MPhil)‐ 19 Chip Multiprocessors (ACS MPhil)‐ 20

Overview Limitations of lock-based programming Transactional - PowerPoint PPT Presentation

Overview Limitations of lock-based programming Transactional memory Programming with TM 6 Transactional Memory Software TM (STM) Hardware TM (HTM) Chip Multiprocessors (ACS MPhil) Robert Mullins Chip

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

Evaluating F-RTO (RFC 4138) Markku Kojo, Kazunori Yamamoto, Max Hata, Pasi Sarolahti Draft

CSE440: Introduction to HCI Methods for Design, Prototyping and Evaluating User Interaction

Files This Week: Read Chapter 3 Administrative: Rock.c The Operating System &

Preview question In a 32-bit Linux/x86 program, which of these objects would have the lowest

Exceptions Defensive programming Anticipating that something could go wrong Handling

ProtoDUNE Data Flow Protocol For discussion at the DAQ meeting Kurt Biery, Giovanna Lehmann

Exceptions Lecture 15 CGS 3416 Spring 2017 April 12, 2017 Lecture 15CGS 3416 Spring 2017

Recovery Techniques The classical UNIX philosophy: When things go bad, broadcast a warning

Overview Limitations of lock-based programming Transactional - PowerPoint PPT Presentation

Overview Limitations of lock-based programming Transactional memory Programming with TM 6 Transactional Memory Software TM (STM) Hardware TM (HTM) Chip Multiprocessors (ACS MPhil) Robert Mullins Chip

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

Evaluating F-RTO (RFC 4138) Markku Kojo, Kazunori Yamamoto, Max Hata, Pasi Sarolahti Draft

CSE440: Introduction to HCI Methods for Design, Prototyping and Evaluating User Interaction

Files This Week: Read Chapter 3 Administrative: Rock.c The Operating System &amp;

Preview question In a 32-bit Linux/x86 program, which of these objects would have the lowest

Exceptions Defensive programming Anticipating that something could go wrong Handling

ProtoDUNE Data Flow Protocol For discussion at the DAQ meeting Kurt Biery, Giovanna Lehmann

Exceptions Lecture 15 CGS 3416 Spring 2017 April 12, 2017 Lecture 15CGS 3416 Spring 2017

Recovery Techniques The classical UNIX philosophy: When things go bad, broadcast a warning

Files This Week: Read Chapter 3 Administrative: Rock.c The Operating System &