SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL - - PowerPoint PPT Presentation

seer
SMART_READER_LITE
LIVE PREVIEW

SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL - - PowerPoint PPT Presentation

27 th Symposium on Parallel Architectures and Algorithms SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY Nuno Diegues , Paolo Romano and Stoyan Garbatov 2 Seer: Scheduling for Commodity HTM SPAA 2015 The multi-core


slide-1
SLIDE 1

SEER

Nuno Diegues, Paolo Romano and Stoyan Garbatov 27th Symposium on Parallel Architectures and Algorithms

PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY

slide-2
SLIDE 2

Seer: Scheduling for Commodity HTM SPAA 2015

2

Multi-cores are now ubiquitous

The multi-core (r)evolution

Shared Memory

CPU 1 CPU 2 CPU 3 CPU 4

Concurrent programming is complex

Hard to get right:

  • fine-grained locks
  • deadlocks
  • correctness

Classic approach: Locking

atomic { withdraw(acc1,val); deposit(acc2,val); }

Transactional Memory abstraction Programmer identifies atomic blocks Runtime implements synchronization

Transactional Memory System

slide-3
SLIDE 3

Seer: Scheduling for Commodity HTM SPAA 2015

3

Too much optimism

  • Problem: CPU time is wasted
  • run other computations instead
  • inhibit parallelism
  • improve cache usage
  • increase core frequency
  • reduce power consumption

y = x x++

I d e n t i f y l i k e l y c

  • n

f l i c t s b e f

  • r

e t h e y h a p p e n

slide-4
SLIDE 4

Seer: Scheduling for Commodity HTM SPAA 2015

4

Scheduler

  • Software TM (STM): library has full concurrency control
  • can point precisely the culprit for the conflict

H T M a v a i l a b l e i n c

  • m

m

  • d

i t y p r

  • c

e s s

  • r

s

  • Hardware TM (HTM): feedback is quite limited
  • rough categorization for the type of conflict
slide-5
SLIDE 5

Objective: Scheduling for Commodity HTM

Seer: Scheduling for Commodity HTM SPAA 2015

5

Avoid running T1 and T2 concurrently How to find the root cause for the data conflict?

slide-6
SLIDE 6

In an ideal world for HTMs…

xbegin widthdraw(acc1,val) deposit(acc2,val) xend

Seer: Scheduling for Commodity HTM SPAA 2015

6

Transactions may abort:

  • because of contention on

same memory locations

…and every transaction shall eventually succeed

Transactions restart

slide-7
SLIDE 7

…in practice: HTMS are Best-Effort

Seer: Scheduling for Commodity HTM SPAA 2015

7

No progress guarantees:

  • A transaction may always abort

…due to a number of reasons:

  • Forbidden instructions
  • Capacity of caches (for reads and writes)
  • Faults and signals
  • Contending transactions, aborting each other
slide-8
SLIDE 8

Single Global Lock SGL fall-back path for HTM

Seer: Scheduling for Commodity HTM SPAA 2015

8

  • Hardware transaction executes if SGL is free
  • Acquire SGL depending on retry policy
  • SGL is a very simple scheduler
  • Ignores the root cause
  • Takes a global decision --- the SGL
  • Adaptive Transaction Scheduling [SPAA08]

W e n e e d b e t t e r S c h e d u l i n g f

  • r

C

  • m

m

  • d

i t y H T M s

slide-9
SLIDE 9

Seer: Scheduling for Commodity HTM SPAA 2015

9

Related Work

Scheduler

Support for HTM? Support for Imprecise Information? Schedules Transactions in a Fine-Grained Fashion?

ATS [SPAA08] Yes Yes No CAR-STM [PODC08] No No Yes Shrink [PODC09] No No Yes ProPS [Euro-Par14] No No Yes SER [PPoPP10] No No Yes TxLinux [SOSP07] Yes No Yes SOA [HiPEAC09/10] Yes No Yes Seer Yes Yes Yes

slide-10
SLIDE 10

Seer: Scheduling for Commodity HTM SPAA 2015

10

Key Idea

  • Transactions to be executed are announced
  • Many observations are collected
  • upon transaction commit and abort
  • which transactions were active at the same time?
  • Over time, the outliers will be identifiable w.h.p.
  • A dynamic, fine-grained, locking scheme is devised
slide-11
SLIDE 11

Seer: Scheduling for Commodity HTM SPAA 2015

11

Seer: overview

Transaction = source code transaction active transactions

slide-12
SLIDE 12

Seer: Scheduling for Commodity HTM SPAA 2015

12

Seer: details

  • Threads collect lightweight events independently --- low overhead
  • Locking scheme (re-)calculated periodically
  • Calculate conditional probabilities of commit/abort
  • Relevance threshold based on mean/stdev
  • One lock per transaction (atomic block in the application)
  • T1 lock (L1) taken by T2 if they are deemed to conflict
  • T1 waits for L1 to be free before executing
slide-13
SLIDE 13

Seer: Scheduling for Commodity HTM SPAA 2015

13

Seer: details

For each pair of transactions (x,y) acquire lock of each other if: Are abort events of x common enough with y running concurrently? Is y one of the main causes for x to abort? Hill climbing based adaptive loop for optimal Threshold search.

slide-14
SLIDE 14

Seer: Scheduling for Commodity HTM SPAA 2015

14

Seer: optimizations

  • Capacity Aborts: another limitation from best-effort nature
  • Per-core lock
  • Taken when capacity aborts occur
  • Tailored for hyper-thread usage

Only one thread (re-)calculates the locking scheme:

  • Whenever it is waiting for the SGL (some thread is on the fallback path)
  • If the SGL is rarely taken, then scheduling will not improve
  • Lock acquisition
  • Hardware transaction used as multi-CAS for 2+ locks
slide-15
SLIDE 15

Seer: Scheduling for Commodity HTM SPAA 2015

15

Evaluation

  • HLE: Intel Hardware Lock Elision, i.e., no scheduling
  • RTM: Intel Commodity HTM with a SGL
  • SCM: Software-assisted Contention Management
  • [PODC14] --- schedule with a (single) auxiliary lock
  • aux lock is not read speculatively (in hw tx)
  • Seer: our Probabilistic Scheduler on top of Intel RTM

Intel Haswell 4 cores (8 hyper-threads)

slide-16
SLIDE 16

Seer: Scheduling for Commodity HTM SPAA 2015

16

How much can we gain with Seer?

Threads Threads

Geometric Mean Speedup in STAMP 50%

Speedup Speedup

Genome Intruder

slide-17
SLIDE 17

Seer: Scheduling for Commodity HTM SPAA 2015

17

What motivates these gains?

  • HLE: 77% with fall-back lock

Geometric Mean over STAMP w/ 8 threads Fine-grained locks

  • RTM: 37% with SGL
  • SCM: 5% with SGL, 29% with (single) auxiliary lock
  • Seer:
  • 3% with at least one tx lock
  • 4% with core lock
  • 12% with tx + core locks
  • 1% with SGL
slide-18
SLIDE 18

Seer: Scheduling for Commodity HTM SPAA 2015

18

Relevance of each mechanism?

Baseline: Seer with all mechanisms enabled (i.e., their overhead) but without any lock acquisitions.

HTM lock acquisition: Small improvement --- benchmark dependent the more locks, the better

Transaction locks: Detect conflicts inherent to benchmarks Core locks: Only relevant for >4t (hyper-threading)

Threshold tuning for probabilities Consistent/small improvement

slide-19
SLIDE 19

Seer: Scheduling for Commodity HTM SPAA 2015

19

Summary

First scheduler tailored for Commodity HTMs:

  • Copes with imprecise information
  • Schedules transactions in a fine-grained manner
  • 50% performance improvement with 8 threads
  • 0-8% overhead from monitoring/calculation
  • Taken by measuring Seer, but without acquiring locks
slide-20
SLIDE 20

Seer: Scheduling for Commodity HTM SPAA 2015

20

Thank you

Questions?

  • Nuno Diegues, Paolo Romano and Stoyan Garbatov
slide-21
SLIDE 21

Seer: Scheduling for Commodity HTM SPAA 2015

21

Backup slides

slide-22
SLIDE 22

Seer: Scheduling for Commodity HTM SPAA 2015

22

HTM with a fall-back path

start: int status = htm_begin code: application logic htm_end // fast-path

slide-23
SLIDE 23

Seer: Scheduling for Commodity HTM SPAA 2015

23

HTM with a fall-back path

start: int status = htm_begin if (status == ok) // != ok when aborted if (fallback-in-use()) htm_abort // fall-back in use else goto code // fast-path ?? code: application logic if (inFastPath) htm_end // fast-path else ??

slide-24
SLIDE 24

Seer: Scheduling for Commodity HTM SPAA 2015

24

HTM with a fall-back path

start: int status = htm_begin if (status == ok) // != ok when aborted if (fallback-in-use()) htm_abort // fall-back in use else goto code // fast-path if (shouldRetry()) // retry policy goto start else use-fallback() // use fall-back code: application logic if (inFastPath) htm_end // fast-path else quit-fallback() // fall-back

slide-25
SLIDE 25

Seer: Scheduling for Commodity HTM SPAA 2015

25

HTM with a fall-back: a single lock

start: int status = htm_begin if (status == ok) // != ok when aborted if (isTaken(lock)) htm_abort // fall-back in use else goto code // fast-path if (shouldRetry()) // retry policy: e.g., limit retries to 10 goto start else acquire(lock) // use fall-back code: application logic if (inFastPath) // fast-path htm_end else // fall-back release(lock) Still simple enough.