hardware transactional memory on haswell ep
play

Hardware Transactional Memory on Haswell EP Viktor Leis Technische - PowerPoint PPT Presentation

Hardware Transactional Memory on Haswell EP Viktor Leis Technische Universitt Mnchen 1 / 14 Introduction Intels new mid-level server platform: Haswell EP up to 18 cores per socket (up to 72 hardware threads with 2 sockets)


  1. Hardware Transactional Memory on Haswell EP Viktor Leis Technische Universität München 1 / 14

  2. Introduction ◮ Intel’s new mid-level server platform: Haswell EP ◮ up to 18 cores per socket (up to 72 hardware threads with 2 sockets) ◮ supports hardware transactional memory (TSX) 2 / 14

  3. Experimental Setup ◮ global fallback lock ◮ built-in Hardware Lock Elision (HLE) ◮ lock elision implemented using RTM, restarts and re-speculation ◮ workload ◮ Adaptive Radix Tree (trie, fanout 2-256), designed for main-memory database systems ◮ random lookups in tree with 64M entries ◮ 64M random inserts into (initially empty) tree 3 / 14

  4. Intel Xeon E5-2697 v3 ◮ 14 cores (28 threads), 2.6GHz-3.6GHz, 35MB LLC ◮ 2 sockets memory controller memory controller internal link (to other ring) core 0 L3 L3 core 10 L3 core 4 core 7 L3 core 1 L3 core 11 L3 L3 core 5 core 8 L3 core 2 L3 L3 core 12 L3 core 6 core 9 L3 L3 core 3 L3 core 13 QPI interconnect (to other socket) 4 / 14

  5. Lookups with Locking no sync 75 M ops/s 50 atomic 25 rw_spin_lock 0 1 14 28 42 56 threads 5 / 14

  6. Lookups with HTM no sync 75 7 or more restarts M ops/s 50 3 restarts 25 2 restarts built-in HLE 1 restarts 0 restarts 0 1 14 28 42 56 threads 6 / 14

  7. Random Inserts with HTM 120 pre − allocate + memset 26.0x 90 M ops/s pre − allocate 60 16.1x tcmalloc 12.2x 30 0.8x malloc 0 1 14 28 42 56 threads 7 / 14

  8. HTM and NUMA ◮ lookup: 1 thread 7 threads speedup 1 cluster 9.2 53.0 5.8 × 1 socket 5.4 36.0 6.7 × 2 sockets 3.6 24.5 6.8 × ◮ insert: insert 1 thread 7 threads speedup 1 cluster 5.3 30.6 5.8 × 1 socket 4.3 26.8 6.2 × 2 sockets 3.0 20.2 6.7 × 8 / 14

  9. Conclusions ◮ Intel’s HTM implementation can scale to NUMA systems with many many cores ◮ pitfalls at higher thread counts: ◮ built-in HLE does not scale ◮ lock elision with 20 restarts and re-speculation should be used instead ◮ even infrequent kernel traps or system calls can be a problem at higher thread counts (Amdahl’s Law) 9 / 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend