Self-Tuning Intel TSX Nuno Diegues and Paolo Romano HTDC 2014 PhD - - PowerPoint PPT Presentation

self tuning intel tsx
SMART_READER_LITE
LIVE PREVIEW

Self-Tuning Intel TSX Nuno Diegues and Paolo Romano HTDC 2014 PhD - - PowerPoint PPT Presentation

Self-Tuning Intel TSX Nuno Diegues and Paolo Romano HTDC 2014 PhD Thesis: Protocols and Abstractions for Efficient Transactional Systems Reduce aborts preserving consistency shared memory / distributed Efficient transactional


slide-1
SLIDE 1

Self-Tuning Intel TSX

Nuno Diegues and Paolo Romano

HTDC 2014

slide-2
SLIDE 2

PhD Thesis: Protocols and Abstractions for Efficient Transactional Systems

  • Reduce aborts
  • preserving consistency
  • shared memory / distributed
  • Efficient transactional indexation/search
  • Energy-efficiency of TM systems
  • More recently: how can we leverage on hardware

for efficient transactional systems

slide-3
SLIDE 3

Using TSX

_xbegin

  • // your transactional code
  • _xend
slide-4
SLIDE 4

Using TSX

_xbegin

  • // your transactional code
  • _xend

May Abort

slide-5
SLIDE 5

Using TSX

_xbegin

  • // your transactional code
  • _xend

May Abort

  • Data contention
  • Forbidden

instructions

  • Hardware buffers’

capacity

  • Signals and faults
slide-6
SLIDE 6

Using TSX

_xbegin

  • // your transactional code
  • _xend

May Abort

  • Data contention
  • Forbidden

instructions

  • Hardware buffers’

capacity

  • Signals and faults

Transparently Restarts

slide-7
SLIDE 7

Using TSX

Best-effort nature we cannot rely exclusively on TSX

slide-8
SLIDE 8

Best-effort nature

Not *that* specific to Intel TSX. IBM HTMs apply partly here too

slide-9
SLIDE 9

Best-effort nature

Not *that* specific to Intel TSX. IBM HTMs apply partly here too

begin: unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin else acquire(lock)

  • code:

// your transactional code

  • if (shouldRetry)

_xend else release(lock)

slide-10
SLIDE 10

Best-effort nature

Not *that* specific to Intel TSX. IBM HTMs apply partly here too

begin: unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin else acquire(lock)

  • code:

// your transactional code

  • if (shouldRetry)

_xend else release(lock)

Transactions need to be aware of this

slide-11
SLIDE 11

Summary of issues

  • Lemming effect
  • Number of attempts
  • Retry policy
  • Management of fall-back

Abort Code retry Transient Failure conflict Contention to Data capacity Exceeded Cache Capacity explicit _xabort invoked

  • ther

slide-12
SLIDE 12

Lemming Effect

begin: wait lock is free unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin else acquire(lock)

  • ...

Programming with HLE Afek et al. PPoPP 2013

  • r
slide-13
SLIDE 13

Number of attempts

1 2 4 2 4 6 12 14 16 speedup retries high contention low contention

Kmeans from STAMP

slide-14
SLIDE 14

Number of attempts

1 2 4 2 4 6 12 14 16 speedup retries high contention low contention

Kmeans from STAMP G r a d i e n t D e s c e n t

  • f
  • r

e x p l

  • r

a t i

  • n
slide-15
SLIDE 15

Retry policy

  • How many attempts in hardware?
  • Give up on capacity aborts?
  • How to manage the fall-back?
slide-16
SLIDE 16

Retry policy

1 2 3 1 2 3 4 5 6 7 8

speedup threads GCC (Possible) Self-Tuning

none-giveup-1 aux-giveup-3 wait-giveup-4 wait-stubborn-4 wait-stubborn-4 wait-half-8 wait-half-11 wait-stubborn-11

slide-17
SLIDE 17

Retry policy

1 2 3 1 2 3 4 5 6 7 8

speedup threads GCC (Possible) Self-Tuning

none-giveup-1 aux-giveup-3 wait-giveup-4 wait-stubborn-4 wait-stubborn-4 wait-half-8 wait-half-11 wait-stubborn-11

R e i n f

  • r

c e m e n t l e a r n i n g

  • U

p p e r C

  • n

fi d e n c e B

  • u

n d

slide-18
SLIDE 18

Self-Tuning TSX

atomic_begin

fetch atomic block's stats yes no

fetch last configuration Profile cycles Begin Tx procedure atomic_end

execute atomic block

End Tx Procedure

Re-optimize?

application logic

Profile cycles

Run grad() Run ucb()

changes next configuration yes no continue program govern retry management abort retry

Re-optimize?

gcc libitm gcc libitm

requires more work

slide-19
SLIDE 19

Quick flavour on results

1 2 3 5 20 25 throughput (1000 txs/sec) execution time (sec) GCC Heuristic AdaptiveLocks Tuner

benchmark finished

Yada from STAMP

slide-20
SLIDE 20

Quick flavour on results

1 2 3 4 1 2 3 4 5 6 7 8

“ideal” self-tuning

speedup

threads

Intruder from STAMP

slide-21
SLIDE 21

Summary

  • Best-effort HTMs need proper tuning
  • No one-size fits all
  • We used lightweight exploration/learning techniques
  • Transparent to the programmer