Self-Tuning Intel TSX
Nuno Diegues and Paolo Romano
HTDC 2014
Self-Tuning Intel TSX Nuno Diegues and Paolo Romano HTDC 2014 PhD - - PowerPoint PPT Presentation
Self-Tuning Intel TSX Nuno Diegues and Paolo Romano HTDC 2014 PhD Thesis: Protocols and Abstractions for Efficient Transactional Systems Reduce aborts preserving consistency shared memory / distributed Efficient transactional
Nuno Diegues and Paolo Romano
HTDC 2014
for efficient transactional systems
_xbegin
_xbegin
May Abort
_xbegin
May Abort
instructions
capacity
_xbegin
May Abort
instructions
capacity
Transparently Restarts
Best-effort nature we cannot rely exclusively on TSX
Not *that* specific to Intel TSX. IBM HTMs apply partly here too
Not *that* specific to Intel TSX. IBM HTMs apply partly here too
begin: unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin else acquire(lock)
// your transactional code
_xend else release(lock)
Not *that* specific to Intel TSX. IBM HTMs apply partly here too
begin: unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin else acquire(lock)
// your transactional code
_xend else release(lock)
Transactions need to be aware of this
Abort Code retry Transient Failure conflict Contention to Data capacity Exceeded Cache Capacity explicit _xabort invoked
…
begin: wait lock is free unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin else acquire(lock)
Programming with HLE Afek et al. PPoPP 2013
1 2 4 2 4 6 12 14 16 speedup retries high contention low contention
Kmeans from STAMP
1 2 4 2 4 6 12 14 16 speedup retries high contention low contention
Kmeans from STAMP G r a d i e n t D e s c e n t
e x p l
a t i
1 2 3 1 2 3 4 5 6 7 8
speedup threads GCC (Possible) Self-Tuning
none-giveup-1 aux-giveup-3 wait-giveup-4 wait-stubborn-4 wait-stubborn-4 wait-half-8 wait-half-11 wait-stubborn-11
1 2 3 1 2 3 4 5 6 7 8
speedup threads GCC (Possible) Self-Tuning
none-giveup-1 aux-giveup-3 wait-giveup-4 wait-stubborn-4 wait-stubborn-4 wait-half-8 wait-half-11 wait-stubborn-11
R e i n f
c e m e n t l e a r n i n g
p p e r C
fi d e n c e B
n d
atomic_begin
fetch atomic block's stats yes no
fetch last configuration Profile cycles Begin Tx procedure atomic_end
execute atomic block
End Tx Procedure
Re-optimize?
application logic
Profile cycles
Run grad() Run ucb()
changes next configuration yes no continue program govern retry management abort retry
Re-optimize?
gcc libitm gcc libitm
requires more work
1 2 3 5 20 25 throughput (1000 txs/sec) execution time (sec) GCC Heuristic AdaptiveLocks Tuner
benchmark finished
Yada from STAMP
1 2 3 4 1 2 3 4 5 6 7 8
“ideal” self-tuning
speedup
threads
Intruder from STAMP