extending hardware transactional memory capacity via
play

Extending Hardware Transactional Memory Capacity via Rollback-Only - PowerPoint PPT Presentation

Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and Suspend/Resume Alexander Shady Issa Pascal Felber Paolo Romano Matveev 1 Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and


  1. Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and Suspend/Resume Alexander Shady Issa Pascal Felber Paolo Romano Matveev 1

  2. Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and Suspend/Resume POWER8-TM Alexander Shady Issa Pascal Felber Paolo Romano Matveev 1

  3. Transactional Memory • alternative paradigm for parallel programming • easy to use • potential of fine-grained locking performance withdraw(account, value){ __transaction{ if account.balance > value: account.balance -= value; return account.balance; else return -1; } } Transactional memory implementation 2

  4. Hardware Transactional Memory • Intel and IBM processors • implemented in the cache coherence protocol • cache line granularity • best effort • S/W fallback is needed 3

  5. Capacity Limitations 6 90 Throughput (10 6 Tx/s) ROT capacity 80 HTM-SGL 5 ROT conflicts Abort rate (%) 70 Lock aborts 4 60 HTM capacity 50 HTM non-tx 3 40 HTM tx 2 30 20 1 10 0 0 Transaction size 4

  6. Capacity Limitations capacity aborts 6 90 Throughput (10 6 Tx/s) ROT capacity 80 HTM-SGL 5 ROT conflicts Abort rate (%) 70 Lock aborts 4 60 HTM capacity 50 HTM non-tx 3 40 HTM tx 2 30 20 1 10 0 0 Transaction size 4

  7. Capacity Limitations capacity aborts 6 90 Throughput (10 6 Tx/s) ROT capacity 80 HTM-SGL 5 ROT conflicts Abort rate (%) 70 Lock aborts 4 60 HTM capacity 50 HTM non-tx 3 40 HTM tx 2 30 20 1 10 0 0 activation of Transaction size the fallback path 4

  8. POWER8-TM • hardware/software co-design • utilises specific features available in POWER8: • suspend/resume • ROTs • to support execution of larger transactions 5

  9. Rollback-only Transaction • lightweight transaction type • updates are applied atomically • does not track the reads • theoretically infinite read-set • not serialisable 6

  10. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 End ROT read X returns 1 inconsistent value 7

  11. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT WAR read X returns 0 X = 1 End ROT read X returns 1 inconsistent value 7

  12. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 End ROT read X 8

  13. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 read X End ROT returns 0 new value can consistent only appear now 8

  14. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 RAW read X End ROT returns 0 new value can consistent only appear now 8

  15. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X X = 1 End ROT read X wait for concurrent ROTs non-transactionally 9

  16. ROTs X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT WAR read X X = 1 WAR read Y Y = 1 End ROT End ROT 10

  17. ROTs X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X X = 1 X = 0 X = 1 Y = 1 Y = 0 read Y Y = 1 End ROT End ROT 10

  18. Touch-to-Validate • core algorithm of P8TM • to make concurrent execution of ROTs safe and serialisable • basic intuition: convert WAR to RAW 11

  19. T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y End ROT write Y End ROT 12

  20. T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y End ROT End ROT 12

  21. T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y re-read X re-read Y End ROT End ROT 12

  22. T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y re-read X re-read Y End ROT End ROT 12

  23. T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y re-read X re-read Y End ROT End ROT 12

  24. T2V • needs to track only the addresses • this must be done in software • how can software outperform hardware? 13

  25. TMCAM 1:____________ 2:____________ Begin HTM 3:____________ read A 4:____________ 5:____________ read B 6:____________ 7:____________ read C 8:____________ 9:____________ read D 10:____________ write E End HTM 64:___________ TMCAM 14

  26. TMCAM &A 1:____________ &B 2:____________ Begin HTM &C 3:____________ &D read A 4:____________ &E 5:____________ read B 6:____________ 7:____________ read C 8:____________ 9:____________ read D 10:____________ write E End HTM 64:___________ TMCAM 14

  27. Read-set Tracking 1:___________________________________ 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ 5:___________________________________ read B 6:___________________________________ 7:___________________________________ read C 8:___________________________________ 9:___________________________________ read D 10:__________________________________ write E End ROT 64:__________________________________ 15

  28. Read-set Tracking 1:___________________________________ 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ store &A 5:___________________________________ read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 15

  29. Read-set Tracking &A &B &C&D 1:___________________________________ &E 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ store &A 5:___________________________________ read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 15

  30. Read-set Tracking 8 bytes &A &B &C&D 1:___________________________________ &E 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ store &A 5:___________________________________ read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 128bytes 15

  31. Read-set Tracking 8 bytes &A &B &C&D 1:___________________________________ &E 2:___________________________________ Begin ROT 3:___________________________________ up to 16x read A 4:___________________________________ store &A 5:___________________________________ larger read-set read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 128bytes 15

  32. HTM • transactions may fit in HTM • we need to avoid extra overheads of using ROTs • try first in HTM, if it overflows, fallback to ROT • how can HTMs and ROTs run concurrently? 16

  33. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X X = 1 Y = 1 End HTM End ROT 17

  34. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X X = 1 Y = 1 End HTM End ROT HTM is protected by H/W 17

  35. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X Y = 1 End HTM End ROT HTM is protected by H/W 17

  36. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y Y = 1 End HTM read Y End ROT HTM is protected by H/W 17

  37. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y returns 0 Y = 1 End HTM inconsistent value read Y returns 1 End ROT HTM is protected by H/W 17

  38. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y returns 0 Y = 1 End HTM inconsistent value read Y returns 1 T2V End ROT HTM is protected by H/W 17

  39. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y returns 0 Y = 1 using S/R consistent value read Y returns 0 End HTM T2V End ROT HTM is protected by H/W 17

  40. Uninstrumented Read-only • read only transactions without any instrumentation • outside the context of HTM or ROT • no bounds on Tx size • HTMs and ROTs must wait for UROs 18

  41. POWER8-TM w/o Transaction read-only instrumentation Tx update Tx GL HTM ROT 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend