Self-Tuning Intel TSX 3rd Euro-TM Workshop on Transactional Memory - - PowerPoint PPT Presentation

self tuning intel tsx
SMART_READER_LITE
LIVE PREVIEW

Self-Tuning Intel TSX 3rd Euro-TM Workshop on Transactional Memory - - PowerPoint PPT Presentation

Self-Tuning Intel TSX 3rd Euro-TM Workshop on Transactional Memory Nuno Diegues and Paolo Romano to appear on the 11th USENIX ICAC 2014 Using TSX _xbegin // your transactional code _xend Using TSX _xbegin // your


slide-1
SLIDE 1

Self-Tuning Intel TSX

Nuno Diegues and Paolo Romano 3rd Euro-TM Workshop on Transactional Memory

to appear on the 11th USENIX ICAC 2014

slide-2
SLIDE 2

Using TSX

_xbegin

  • // your transactional code
  • _xend
slide-3
SLIDE 3

Using TSX

_xbegin

  • // your transactional code
  • _xend

May Abort

slide-4
SLIDE 4

Using TSX

_xbegin

  • // your transactional code
  • _xend

May Abort

  • Data contention
  • Forbidden

instructions

  • Hardware buffers’

capacity

  • Signals and faults
slide-5
SLIDE 5

Using TSX

_xbegin

  • // your transactional code
  • _xend

May Abort

  • Data contention
  • Forbidden

instructions

  • Hardware buffers’

capacity

  • Signals and faults

Transparently Restarts

slide-6
SLIDE 6

Using TSX

Best-effort nature we cannot rely exclusively on TSX

slide-7
SLIDE 7

Best-effort nature

Not *that* specific to Intel TSX. IBM HTMs apply partly here too

slide-8
SLIDE 8

Best-effort nature

Not *that* specific to Intel TSX. IBM HTMs apply partly here too

begin: unsigned int status = _xbegin if (status == ok) goto code

  • goto begin
  • code:

// your transactional code

  • _xend
slide-9
SLIDE 9

Best-effort nature

Not *that* specific to Intel TSX. IBM HTMs apply partly here too

begin: unsigned int status = _xbegin if (status == ok) goto code

  • goto begin
  • code:

// your transactional code

  • _xend
  • goto code

// fast path

  • _xend

// fast path

slide-10
SLIDE 10

Best-effort nature

Not *that* specific to Intel TSX. IBM HTMs apply partly here too

begin: unsigned int status = _xbegin if (status == ok) goto code

  • goto begin
  • code:

// your transactional code

  • _xend
  • goto code

// fast path

  • _xend

// fast path

  • begin:

unsigned int status = _xbegin if (status == ok) goto code // fast path if (shouldRetry) // retry policy goto begin

  • code:

// your transactional code

  • if (shouldRetry)

_xend // fast path

slide-11
SLIDE 11

Best-effort nature

Not *that* specific to Intel TSX. IBM HTMs apply partly here too

begin: unsigned int status = _xbegin if (status == ok) goto code

  • goto begin
  • code:

// your transactional code

  • _xend
  • goto code

// fast path

  • _xend

// fast path

  • begin:

unsigned int status = _xbegin if (status == ok) goto code // fast path if (shouldRetry) // retry policy goto begin

  • code:

// your transactional code

  • if (shouldRetry)

_xend // fast path

  • begin:

unsigned int status = _xbegin if (status == ok) goto code // fast path if (shouldRetry) // retry policy goto begin else acquire(lock) // fallback

  • code:

// your transactional code

  • if (shouldRetry)

_xend // fast path else release(lock) // fallback

slide-12
SLIDE 12

Best-effort nature

Not *that* specific to Intel TSX. IBM HTMs apply partly here too

begin: unsigned int status = _xbegin if (status == ok) goto code

  • goto begin
  • code:

// your transactional code

  • _xend
  • goto code

// fast path

  • _xend

// fast path

  • begin:

unsigned int status = _xbegin if (status == ok) goto code // fast path if (shouldRetry) // retry policy goto begin

  • code:

// your transactional code

  • if (shouldRetry)

_xend // fast path

  • begin:

unsigned int status = _xbegin if (status == ok) goto code // fast path if (shouldRetry) // retry policy goto begin else acquire(lock) // fallback

  • code:

// your transactional code

  • if (shouldRetry)

_xend // fast path else release(lock) // fallback

Transactions need to be aware of this

slide-13
SLIDE 13

Summary of issues

  • Lemming effect
  • Number of attempts
  • Retry policy
  • Management of fall-back
slide-14
SLIDE 14

Summary of issues

1 2 3 1 2 3 4 5 6 7 8

speedup threads GCC (Possible) Self-Tuning

none-giveup-1 aux-giveup-3 wait-giveup-4 wait-stubborn-4 wait-stubborn-4 wait-half-8 wait-half-11 wait-stubborn-11

Genome from STAMP suite

slide-15
SLIDE 15

Number of attempts

1 2 4 2 4 6 12 14 16 speedup retries high contention low contention

Kmeans from STAMP

slide-16
SLIDE 16

Number of attempts

1 2 4 2 4 6 12 14 16 speedup retries high contention low contention

Kmeans from STAMP G r a d i e n t D e s c e n t

  • f
  • r

e x p l

  • r

a t i

  • n
slide-17
SLIDE 17

Gradient Descent

tuning the number of attempts

#attempts performance

?

  • ptimization

round

slide-18
SLIDE 18

Gradient Descent

tuning the number of attempts

#attempts performance

1

?

  • ptimization

round

slide-19
SLIDE 19

Gradient Descent

tuning the number of attempts

#attempts performance

1

randomly search some direction; explore it while profitable

?

  • ptimization

round

slide-20
SLIDE 20

Gradient Descent

tuning the number of attempts

#attempts performance

1 2

randomly search some direction; explore it while profitable

?

  • ptimization

round

slide-21
SLIDE 21

Gradient Descent

tuning the number of attempts

#attempts performance

1 2 3 4

randomly search some direction; explore it while profitable

?

  • ptimization

round

slide-22
SLIDE 22

Gradient Descent

tuning the number of attempts

#attempts performance

1 2 3 4

revert direction when not profitable randomly search some direction; explore it while profitable

?

  • ptimization

round

slide-23
SLIDE 23

Gradient Descent

tuning the number of attempts

#attempts performance

1 2 3 4

revert direction when not profitable

threshold for stabilization

randomly search some direction; explore it while profitable

?

  • ptimization

round

slide-24
SLIDE 24

Gradient Descent

tuning the number of attempts

#attempts performance

1 2 3 4

revert direction when not profitable

threshold for stabilization

randomly search some direction; explore it while profitable random jumps to avoid local minima

5

random jump

?

  • ptimization

round

slide-25
SLIDE 25

Gradient Descent

tuning the number of attempts

#attempts performance

1 2 3 4

revert direction when not profitable

threshold for stabilization

randomly search some direction; explore it while profitable random jumps to avoid local minima

5

random jump

6

?

  • ptimization

round

slide-26
SLIDE 26

Gradient Descent

tuning the number of attempts

#attempts performance

1 2 3 4

revert direction when not profitable

threshold for stabilization

randomly search some direction; explore it while profitable random jumps to avoid local minima

5

random jump

6 7

?

  • ptimization

round

slide-27
SLIDE 27

Gradient Descent

tuning the number of attempts

#attempts performance

1 2 3 4

revert direction when not profitable

threshold for stabilization

randomly search some direction; explore it while profitable random jumps to avoid local minima

5

random jump

6 7

recover from unlucky jumps memorize maxima

?

  • ptimization

round

slide-28
SLIDE 28

Retry policy

  • Give up on capacity aborts?
  • How should we “consume” the attempts’ budget?
  • How to manage the fall-back?
slide-29
SLIDE 29

Retry policy

R e i n f

  • r

c e m e n t l e a r n i n g

  • U

p p e r C

  • n

fi d e n c e B

  • u

n d

slide-30
SLIDE 30

UCB

tuning the retry policy

Lever A Lever B Lever C

? ? ?

slide-31
SLIDE 31

UCB

tuning the retry policy

Lever A Lever B Lever C

A quest for exploration vs benefit from current knowledge ? ? ?

slide-32
SLIDE 32

UCB

tuning the retry policy

Lever A Lever B Lever C

A quest for exploration vs benefit from current knowledge UCB adapts the strategy to maximize reward Logarithmic bound on the optimization error ? ? ?

slide-33
SLIDE 33

UCB

tuning the retry policy

Model the belief about capacity aborts:

  • giveup — exhaust attempts
  • half — drops half the attempts
  • stubborn — decrements attempts

Reward: function of processor cycles (RDTSC)

slide-34
SLIDE 34

Adaptation

  • f one atomic block in Yada
slide-35
SLIDE 35

Adaptation

  • f one atomic block in Yada
  • ptimizers

are *not* independent

slide-36
SLIDE 36

Transparency to the User

atomic_begin

fetch atomic block's stats yes no

fetch last configuration Profile cycles Begin Tx procedure atomic_end

execute atomic block

End Tx Procedure

Re-optimize?

application logic

Profile cycles

Run grad() Run ucb()

changes next configuration yes no continue program govern retry management abort retry

Re-optimize?

gcc libitm gcc libitm

slide-37
SLIDE 37

Transparency to the User

atomic_begin

fetch atomic block's stats yes no

fetch last configuration Profile cycles Begin Tx procedure atomic_end

execute atomic block

End Tx Procedure

Re-optimize?

application logic

Profile cycles

Run grad() Run ucb()

changes next configuration yes no continue program govern retry management abort retry

Re-optimize?

gcc libitm gcc libitm

slide-38
SLIDE 38

Transparency to the User

atomic_begin

fetch atomic block's stats yes no

fetch last configuration Profile cycles Begin Tx procedure atomic_end

execute atomic block

End Tx Procedure

Re-optimize?

application logic

Profile cycles

Run grad() Run ucb()

changes next configuration yes no continue program govern retry management abort retry

Re-optimize?

gcc libitm gcc libitm

slide-39
SLIDE 39

Summary of Evaluation

slide-40
SLIDE 40

Summary of Evaluation

slide-41
SLIDE 41

Peek view on results

1 2 3 4 1 2 3 4 5 6 7 8

“ideal” self-tuning

speedup

threads

Intruder from STAMP

slide-42
SLIDE 42

1 2 3 5 20 25 throughput (1000 txs/sec) execution time (sec) GCC Heuristic AdaptiveLocks Tuner

benchmark finished

Peek view on results

Yada with 8 threads

slide-43
SLIDE 43

Summary

  • Best-effort HTMs need proper tuning
  • No one-size fits all
  • We used lightweight exploration/learning techniques
  • Transparent to the programmer
slide-44
SLIDE 44

Self-Tuning Intel TSX

Nuno Diegues and Paolo Romano

Thank you!

Questions?

to appear on the 11th USENIX ICAC 2014