Transactional Memory Austen McDonald 2 Our New MULTICORE Overlords - PowerPoint PPT Presentation

1 Architectures for Transactional Memory Austen McDonald

2 Our New MULTICORE Overlords • The free lunch for software developers is over – No longer improving thread performance with new processors • Chip Multiprocessors (CMP/Multicore) are here – Improve performance by exploiting thread parallelism To make programs faster, mortal programmers will try parallel programming… M O T I V A T I O N

3 Parallel Programming is Hard • Thread level parallelism is great until we want to share data • Fundamentally, it’s hard to work on shared data at the same time – so we don’t— mutual exclusion via locks • Locks have problems – performance/correctness, fine/coarse tradeoff – deadlocks and failure recovery M O T I V A T I O N

4 Transactional Memory (TM) • Execute large, programmer-defined regions atomically and in isolation *Knight ’86, Herlihy & Moss ’93+ atomic { x = x + y; } • Declarative – No management of locks • Optimistically executing in parallel gains performance M O T I V A T I O N

5 TM Example 1 2 3 4 Goal: Modify node 3 in a thread-safe way. M O T I V A T I O N

6 TM Example 1 2 3 4 M O T I V A T I O N

11 TM Example 1 2 3 4 Goals: Modify nodes 3 and 4 in a thread-safe way. Locking prevents concurrency M O T I V A T I O N

12 TM Example 1 2 3 4 Transaction A READ: WRITE: Goal: Modify node 3 in a thread-safe way. M O T I V A T I O N

13 TM Example 1 2 3 4 Transaction A READ: 1, 2, 3 WRITE: M O T I V A T I O N

14 TM Example 1 2 3 4 Transaction A READ: 1, 2, 3 WRITE: 3 M O T I V A T I O N

15 TM Example 1 2 3 4 Transaction A Transaction B READ: 1, 2, 3 READ: 1, 2, 4 WRITE: 3 WRITE: 4 Goals: Modify nodes 3 and 4 in a thread-safe way. M O T I V A T I O N

16 TM Example 1 2 3 4 Transaction A Transaction B READ: 1, 2, 3 READ: 1, 2, 4 WW conflicts WRITE: 3 WRITE: 4 RW conflicts M O T I V A T I O N

17 TM Example 1 2 3 4 Transaction A Transaction B READ: 1, 2, 3 READ: 1, 2, 3 WRITE: 3 WRITE: 3 M O T I V A T I O N

18 TM Example 1 2 3 4 Transaction A Transaction B READ: 1, 2, 3 READ: 1, 2, 3 WW conflicts WRITE: 3 WRITE: 3 RW conflicts M O T I V A T I O N

19 Guts of TM • To build TM, you need… Versioning Conflict Detection Conflict Resolution T0 T1 T0 T1 atomic { x = x + y; atomic { atomic { x = x / 8; x = x + y; x = x + y; x = x / 8; } x = x / 8; } } Where do you put the How do you detect that How do you enforce new x until commit? reads/writes to x need to be serialization when serialized? required? B U I L D I N G A N H T M

20 Hardware or Software TM? • Can be implemented in HW or SW • SW is slow – Bookkeeping is expensive: 2-8x slowdown • SW has correctness pitfalls – Even for correctly synchronized code! • Let’s use hardware for TM

21 Challenges 1. What’s the best implementation in hardware? • Many available options 2. What’s the right HW/SW interface? • Changing software needs (OSs and Languages) • Changing parallel architectures T H E S I S

22 Contributions • Designed and compared HTM systems • Extended one system to replace coherence and consistency with only transactions • Devised a sufficient software/hardware interface for current and future OS/PL on TM T H E S I S

23 5 Years of My Life on One Slide 1. Motivation & Contributions 2. Building a TM system in hardware 3. An architecture with only transactions 4. What about the interface to software? 5. Conclusions S I G N P O S T

24 Versioning • Versioning: storing new values • Eager: store new values in memory, old values in undo log • Commits fast, Aborts slow • Lazy: store new values in writebuffer • Aborts fast, Commits slow B U I L D I N G A N H T M

25 Conflict Detection • Conflict Detection: detecting RW/WW conflicts – Pessimistic: detect conflicts on cache misses • Avoids useless work, but may cause deadlock/livelock and prevents some serializable schedules – Optimistic: wait until end of transaction • Forward progress can be guaranteed, but some wasted work [explain forward progress]

26 Versioning+Conflict Detection • EP, LP, LO – Not Eager-Optimistic • Note: conflict resolution depends on other two choices

27 Building a Lazy-Optimistic HTM Lazy Versioning – Need to keep new versions (and read-set tracking) until commit – Already have a cache —let’s put it there! Optimistic Conflict Detection – Need to detect conflicts at commit time – Coherence protocol already detects sharing Conflict Resolution – The first committer wins – Simple and guarantees forward progress Aggressive Conflict Resolution B U I L D I N G A N H T M

28 LO HTM Specifics Bus Arbiters CPU 1 CPU 2 CPU N . . . L1 L1 L1 Bus & Snoop Control Bus & Snoop Control Bus & Snoop Control Commit Bus Refill Bus On-chip L2 Cache Changes for TM B U I L D I N G A N H T M

29 LO HTM Specifics Read Bits: Register Processor Checkpoint Load/Store ld 0xdeadbeef Violation Address Write Bits: Store Address Data st 0xcafebabe FIFO MESI R W TAG DATA d Cache Commit: Acquire permission to Commit Address commit Snoop Commit Upgrade lines listed in Store Control Control Address FIFO Commit Commit Address In Address Out Conflict Detection: Request Bus Compare incoming address Refill Bus to R bits B U I L D I N G A N H T M

30 Performance Questions 1. Will transactions perform as well as locks? 2. What is the best HTM system and why? B U I L D I N G A N H T M

31 Methodology • Execution-driven x86 simulator – 1 IPC (except ld/st) • SPLASH-2 Benchmarks – Heavily optimized for MESI • STAMP – Representative applications for today’s workloads – Wide range of transactional behaviors – Difficult to parallelize, TM only apps

32 1. TM vs Locks • Performs similar to locks – TM overhead is negligible *McDonald ’05+ • Similar performance at low contention for all TM schemes B U I L D I N G A N H T M

33 2. Which TM System is Best? • Pessimistic conflict detection degrades performance • Rolling back undo log in eager versioning is expensive B U I L D I N G A N H T M

34 2. Which TM System is Best? • Early conflict detection saves expensive memory accesses – High contention, many accesses / Tx

35 2. Which TM System is Best? • Same for SPLASH applications • Same: 2 of 8 STAMP – genome, kmeans • LO Better: 4 of 8 STAMP – bayes, labyrinth, vacation, yada • EP/LP Better: 2 of 8 STAMP – intruder, ssca2 • How can I decide on one system?

36 2. Which TM System is Best? • Conflict Detection/Resolution principal offender – Need intelligent decisions on conflict • Simple for Optimistic Conflict Detection – Priority/aging and random backoff all you need for progress and fairness *Scott ‘04+ • More complex for Pessimistic – More potential performance problems – Stall or Abort? • Need deadlock/livelock detection – Best solution requires hardware predictor *Bobba ’08’+

37 Summary of Results • TM performs as well as locks • Lazy-Optimistic is the best performing, simplest architecture for TM • Resource overflow is not a problem B U I L D I N G A N H T M

38 1. Motivation & Contributions 2. Building a TM system in hardware 3. An architecture with only transactions 4. What about the interface to software? 5. Conclusions S I G N P O S T

39 Only Transactions Transactions manage communication – Can we dispense with coherence/consistency protocols? • Should be no sharing outside of transactions • In transactions, only care about sharing at boundaries – Easier to reason about parallel programs TCC: Transactional Coherence and Consistency *Hammond ’04, McDonald ’05 ] A L L T R A N S A C T I O N S A L L T H E T I M E

40 TCC • Everything is run inside of a transaction *Hammond ’04+ – Even when you don’t explicitly create one • Still have explicit transactions – To ensure atomicity – Regions between explicit transactions can be split, by the system, into arbitrary transactions • Simplified Reasoning – One mechanism to communicate between threads • Hardware is simpler – Debugging becomes easier *Chafi ’05+ • All accesses are tracked  detect missing explicit transactions – Deterministic replay *Wee ’08+ A L L T R A N S A C T I O N S A L L T H E T I M E

41 TCC Modifies Lazy-Optimistic • No need for MESI Register Processor Checkpoint • Commit Load/Store Violation Address – Send data Store • Only way to maintain Address Data FIFO MESI R W TAG DATA d Cache coherence Data Commit Address Snoop Commit Control Control Commit Commit Address In Address Out Request Bus Refill Bus A L L T R A N S A C T I O N S A L L T H E T I M E

Transactional Memory Austen McDonald 2 Our New MULTICORE Overlords - PowerPoint PPT Presentation

1 Architectures for Transactional Memory Austen McDonald 2 Our New MULTICORE Overlords The free lunch for software developers is over No longer improving thread performance with new processors Chip Multiprocessors (CMP/Multicore)

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Transactional memory with data Transactional memory with data invariants: or putting the

Hardware Transactional Memory Shao-Hung Chiu, Upasana Sridhar Transactional Memory - Where did

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Transactional Memory 1 To read more This days papers: Herlihy and Moss, Transactional

Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions

Verification of Transactional Memories that support Non-Transactional Memory Accesses Ariel Cohen

Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory

Time-Warp: Lightweight Abort Minimization in Transactional Memory Nuno Diegues and Paolo Romano

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

DHTM: Durable Hardware Transactional Memory Arpit Joshi , Vijay Nagarajan, Marcelo Cintra, Stratis

Enhancing Permissiveness of Transactional Memory via Time-Warp Nuno Diegues and Paolo Romano

Inevitability Mechanisms for Inevitability Mechanisms for Software Transactional Memory Software

Improving Our Behavior in Conflicts Becoming A Conflict Competent Leader by Craig Runde And

Negotiation and Conflict Management 612 Week 7: Multi-Party Negotiations Dr. Eric Gladstone

Dealing with Difficult People, Places and Things Presented By: Lisa Harjo Objectives By the

Collaborative Software Development M2R Interaction / Universit Paris-Sud / 2018 - 2019 Cdric

Roundtable Discussion: OED Updates and Cases 2 USPTO Disciplinary Decisions 40 35 7 2 30 5

Collaborative Body Meeting Thursday, April 16 th , 2020 2:00 3:00 P.M. ET This meeting will

Conflict of Interest (COI) Dual Use Research of Concern (DURC) Export Control and

InterestBased Negotiation : Skills for ProblemSolving and Collaboration Eddie Genna, JD,