Motivation STM Best performance Faster Expected gains - - PowerPoint PPT Presentation

motivation
SMART_READER_LITE
LIVE PREVIEW

Motivation STM Best performance Faster Expected gains - - PowerPoint PPT Presentation

F AST L ANE Streamlining Transactions for Low Thread Counts Jons-Tobias Wamhoff Christof Fetzer Technische Universitt Dresden, Germany Pascal Felber Etienne Rivire Universit de Neuchtel, Switzerland Gilles Muller INRIA, France


slide-1
SLIDE 1

FASTLANE

Streamlining Transactions for Low Thread Counts

Jons-Tobias Wamhoff Christof Fetzer Technische Universität Dresden, Germany Pascal Felber Etienne Rivière Université de Neuchâtel, Switzerland Gilles Muller INRIA, France

slide-2
SLIDE 2

Motivation

Number of cores

1

x Performance

Slower Faster Sequential STM FastLane Best performance Expected gains from FastLane Many

2

slide-3
SLIDE 3

General Idea

  • 1 master thread
  • Commits transactions without aborting
  • Minimal instrumentation and bookkeeping
  • N helper threads
  • Commit transactions only when not in conflict
  • Contribute progress without impairing on the

performance of the master

3

slide-4
SLIDE 4

Code Paths

4

START SEQUENTIAL uninstru- mented MASTER lightweight instrumented writes HELPER instrumented, synchronize with master STM instrumented, extensive bookkeeping COMMIT pessimistic code paths speculative code paths

slide-5
SLIDE 5

Code Paths

  • Dresden TM Compiler
  • Generates multiple code paths for sequential

(uninstrumented), FastLane (master & helper) and STM

  • Generic START and COMMIT calls with internal branch
  • READ and WRITE are specific to code path and inlined
  • transaction descriptor only accessed if needed
  • TinySTM++ TM runtime
  • Dynamically select code path based on core or thread

count at BEGIN

5 Christie et al.: Evaluation of AMD's Advanced Synchronization Facility Within a Complete Transactional Memory Stack, EuroSys '10

slide-6
SLIDE 6

Data Structures

6

Dirty array Timestamp Timestamp Master thread isMaster Memory ... ... Address read Address written ... Address read Helper thread Start timestamp Write-set ... Read-set Counter

  • dd: owned

even: otherwise

slide-7
SLIDE 7

Master vs. Helper

7

MASTER READ (addr) return *addr BEGIN acquire(cntr) COMMIT release(cntr) WRITE (addr, val) addr = val dirty[hash(addr)] = cntr HELPER BEGIN start = cntr READ (addr) dirty[hash(addr)] ≤ start add(read-set, addr) abort return *addr WRITE (addr, val) dirty[hash(addr)] ≤ start put(write-set, addr, val) abort COMMIT

slide-8
SLIDE 8

3 Commit Variants

8

COMMIT 1 acquire(cntr) VALIDATE abort proceed

slide-9
SLIDE 9

3 Commit Variants

8

COMMIT 1 acquire(cntr) VALIDATE abort proceed COMMIT 2 c = awaitEven(cntr) VALIDATE cntr ≤ c+1 ∨ VALIDATE acquire(cntr) abort proceed abort

slide-10
SLIDE 10

3 Commit Variants

8

COMMIT 1 acquire(cntr) VALIDATE abort proceed COMMIT 2 c = awaitEven(cntr) VALIDATE cntr ≤ c+1 ∨ VALIDATE acquire(cntr) abort proceed abort COMMIT 3 c = awaitEven(cntr) VALIDATE tryAcquire (cntr, c) failed abort proceed

Spear et al.: RingSTM: Scalable Transactions with a Single Atomic Instruction, SPAA '08

slide-11
SLIDE 11

Intset Benchmarks

9

slide-12
SLIDE 12

10

Thank you!