Self-Tuning Intel TSX
Nuno Diegues and Paolo Romano 3rd Euro-TM Workshop on Transactional Memory
to appear on the 11th USENIX ICAC 2014
Self-Tuning Intel TSX 3rd Euro-TM Workshop on Transactional Memory - - PowerPoint PPT Presentation
Self-Tuning Intel TSX 3rd Euro-TM Workshop on Transactional Memory Nuno Diegues and Paolo Romano to appear on the 11th USENIX ICAC 2014 Using TSX _xbegin // your transactional code _xend Using TSX _xbegin // your
Nuno Diegues and Paolo Romano 3rd Euro-TM Workshop on Transactional Memory
to appear on the 11th USENIX ICAC 2014
_xbegin
_xbegin
May Abort
_xbegin
May Abort
instructions
capacity
_xbegin
May Abort
instructions
capacity
Transparently Restarts
Best-effort nature we cannot rely exclusively on TSX
Not *that* specific to Intel TSX. IBM HTMs apply partly here too
Not *that* specific to Intel TSX. IBM HTMs apply partly here too
begin: unsigned int status = _xbegin if (status == ok) goto code
// your transactional code
Not *that* specific to Intel TSX. IBM HTMs apply partly here too
begin: unsigned int status = _xbegin if (status == ok) goto code
// your transactional code
// fast path
// fast path
Not *that* specific to Intel TSX. IBM HTMs apply partly here too
begin: unsigned int status = _xbegin if (status == ok) goto code
// your transactional code
// fast path
// fast path
unsigned int status = _xbegin if (status == ok) goto code // fast path if (shouldRetry) // retry policy goto begin
// your transactional code
_xend // fast path
Not *that* specific to Intel TSX. IBM HTMs apply partly here too
begin: unsigned int status = _xbegin if (status == ok) goto code
// your transactional code
// fast path
// fast path
unsigned int status = _xbegin if (status == ok) goto code // fast path if (shouldRetry) // retry policy goto begin
// your transactional code
_xend // fast path
unsigned int status = _xbegin if (status == ok) goto code // fast path if (shouldRetry) // retry policy goto begin else acquire(lock) // fallback
// your transactional code
_xend // fast path else release(lock) // fallback
Not *that* specific to Intel TSX. IBM HTMs apply partly here too
begin: unsigned int status = _xbegin if (status == ok) goto code
// your transactional code
// fast path
// fast path
unsigned int status = _xbegin if (status == ok) goto code // fast path if (shouldRetry) // retry policy goto begin
// your transactional code
_xend // fast path
unsigned int status = _xbegin if (status == ok) goto code // fast path if (shouldRetry) // retry policy goto begin else acquire(lock) // fallback
// your transactional code
_xend // fast path else release(lock) // fallback
Transactions need to be aware of this
1 2 3 1 2 3 4 5 6 7 8
speedup threads GCC (Possible) Self-Tuning
none-giveup-1 aux-giveup-3 wait-giveup-4 wait-stubborn-4 wait-stubborn-4 wait-half-8 wait-half-11 wait-stubborn-11
Genome from STAMP suite
1 2 4 2 4 6 12 14 16 speedup retries high contention low contention
Kmeans from STAMP
1 2 4 2 4 6 12 14 16 speedup retries high contention low contention
Kmeans from STAMP G r a d i e n t D e s c e n t
e x p l
a t i
tuning the number of attempts
#attempts performance
?
round
tuning the number of attempts
#attempts performance
1
?
round
tuning the number of attempts
#attempts performance
1
randomly search some direction; explore it while profitable
?
round
tuning the number of attempts
#attempts performance
1 2
randomly search some direction; explore it while profitable
?
round
tuning the number of attempts
#attempts performance
1 2 3 4
randomly search some direction; explore it while profitable
?
round
tuning the number of attempts
#attempts performance
1 2 3 4
revert direction when not profitable randomly search some direction; explore it while profitable
?
round
tuning the number of attempts
#attempts performance
1 2 3 4
revert direction when not profitable
threshold for stabilization
randomly search some direction; explore it while profitable
?
round
tuning the number of attempts
#attempts performance
1 2 3 4
revert direction when not profitable
threshold for stabilization
randomly search some direction; explore it while profitable random jumps to avoid local minima
5
random jump
?
round
tuning the number of attempts
#attempts performance
1 2 3 4
revert direction when not profitable
threshold for stabilization
randomly search some direction; explore it while profitable random jumps to avoid local minima
5
random jump
6
?
round
tuning the number of attempts
#attempts performance
1 2 3 4
revert direction when not profitable
threshold for stabilization
randomly search some direction; explore it while profitable random jumps to avoid local minima
5
random jump
6 7
?
round
tuning the number of attempts
#attempts performance
1 2 3 4
revert direction when not profitable
threshold for stabilization
randomly search some direction; explore it while profitable random jumps to avoid local minima
5
random jump
6 7
recover from unlucky jumps memorize maxima
?
round
R e i n f
c e m e n t l e a r n i n g
p p e r C
fi d e n c e B
n d
tuning the retry policy
Lever A Lever B Lever C
? ? ?
tuning the retry policy
Lever A Lever B Lever C
A quest for exploration vs benefit from current knowledge ? ? ?
tuning the retry policy
Lever A Lever B Lever C
A quest for exploration vs benefit from current knowledge UCB adapts the strategy to maximize reward Logarithmic bound on the optimization error ? ? ?
tuning the retry policy
Model the belief about capacity aborts:
Reward: function of processor cycles (RDTSC)
are *not* independent
atomic_begin
fetch atomic block's stats yes no
fetch last configuration Profile cycles Begin Tx procedure atomic_end
execute atomic block
End Tx Procedure
Re-optimize?
application logic
Profile cycles
Run grad() Run ucb()
changes next configuration yes no continue program govern retry management abort retry
Re-optimize?
gcc libitm gcc libitm
atomic_begin
fetch atomic block's stats yes no
fetch last configuration Profile cycles Begin Tx procedure atomic_end
execute atomic block
End Tx Procedure
Re-optimize?
application logic
Profile cycles
Run grad() Run ucb()
changes next configuration yes no continue program govern retry management abort retry
Re-optimize?
gcc libitm gcc libitm
atomic_begin
fetch atomic block's stats yes no
fetch last configuration Profile cycles Begin Tx procedure atomic_end
execute atomic block
End Tx Procedure
Re-optimize?
application logic
Profile cycles
Run grad() Run ucb()
changes next configuration yes no continue program govern retry management abort retry
Re-optimize?
gcc libitm gcc libitm
1 2 3 4 1 2 3 4 5 6 7 8
“ideal” self-tuning
speedup
threads
Intruder from STAMP
1 2 3 5 20 25 throughput (1000 txs/sec) execution time (sec) GCC Heuristic AdaptiveLocks Tuner
benchmark finished
Yada with 8 threads
Self-Tuning Intel TSX
Nuno Diegues and Paolo Romano
Questions?
to appear on the 11th USENIX ICAC 2014