e xploiting s emantic c ommutativity in h ardware s
play

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G - PowerPoint PPT Presentation

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G UOWEI Z HANG , V IRGINIA C HIU , D ANIEL S ANCHEZ MICRO 2016 Executive summary 2 Exploiting commutativity benefits update-heavy apps Software techniques that exploit


  1. E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G UOWEI Z HANG , V IRGINIA C HIU , D ANIEL S ANCHEZ MICRO 2016

  2. Executive summary 2 ¨ Exploiting commutativity benefits update-heavy apps ¤ Software techniques that exploit commutativity incur high run- time overheads (STM is 2-6x slower than HTM) ¤ Prior hardware exploits only single-instruction commutative operations (e.g., addition) ¨ CommTM exploits multi-instruction commutativity ¤ Extends coherence protocol to perform commutative operations locally and concurrently ¤ Leverages HTM to support multi-instruction updates ¤ Benefits speculative execution by reducing conflicts ¤ Accelerates full applications by up to 3.4x at 128 cores

  3. Commutativity 3 ¨ Commutative operations produce equivalent results when reordered ¤ No true data dependence à No need for communication ¤ Software exploits commutativity but incurs high run-time overheads Multi-instruction commutativity Single-instruction Top-K insertion commutativity Set insertion ADD MIN OR Ordered put Coup CommTM [Zhang et al, MICRO 2015]

  4. Commutativity 4 ¨ Commutative operations produce equivalent results when reordered ¤ No true data dependence à No need for communication ¤ Software exploits commutativity but incurs high run-time overheads ¤ Multi-instruction example: set (linked-list) insertion head null insert( a ); insert( b ); a b head null insert( b ); insert( a ); b a head null Different but semantically equivalent states

  5. Example: addition in conventional HTM 6 void add (int* counter, int delta) { tx_begin(); int v = load(counter) ; int nv = v + delta; store(counter, nv) ; tx_end(); }

  6. Example: addition in conventional HTM 6 void add (int* counter, int delta) { tx_begin(); int v = load(counter) ; int nv = v + delta; store(counter, nv) ; tx_end(); }

  7. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  8. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read read int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  9. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read write read Conflict! int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  10. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read write int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  11. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t commit Core 0 Core 1 load A restart add(A, 1): Txn 2 load A store A add(A, 1); add(A, 1); abort commit add(A, 1); add(A, 1); add(A, 1): Txn 3 restart load A load A store A a b o r t commit restart load A store A commit

  12. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t commit Core 0 Core 1 load A restart add(A, 1): Txn 2 load A store A add(A, 1); add(A, 1); abort commit add(A, 1); add(A, 1); add(A, 1): Txn 3 restart load A load A store A a b o r t commit Traffic restart load A Serialization store A commit Wasted transactional work

  13. Example: addition in CommTM 7 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  14. Example: addition in CommTM 7 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  15. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  16. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; read read int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  17. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; read write read write int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  18. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 21 A: 1 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  19. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 22 A: 2 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit

  20. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 22 A: 2 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A

  21. Example: addition in CommTM 7 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; reduction Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A User-defined reduction

  22. Example: addition in CommTM 7 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; reduction Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A Less traffic User-defined reduction Concurrent updates Less wasted transactional work Less run-time/memory overheads than STM

  23. CommTM

  24. Programming interface 9 Transactional update void add (int* counter, int delta) { tx_begin(); int v = load[ADD](counter) ; Labeled loads/stores int nv = v + delta; store[ADD](counter, nv) ; tx_end(); } Non-transactional reduction handler counter 16 + void reduce[ADD] (int* counter, int delta) { int v = load[ADD](counter); 20 delta int nv = v + delta; store[ADD](counter, nv); reduce[ADD] } 36 counter

  25. Handling arbitrary object sizes 10 ¨ For objects smaller than a cache line, assume lines are full of aligned elements and reduce all of them void reduce[ADD] (int* counterLine, int[] deltas) { for (int i = 0; I < intsPerCacheLine; i++) { int v = load[ADD](counterLine[i]); int nv = v + deltas[i]; store[ADD](counterLine[i], nv); } }

  26. Handling arbitrary object sizes 10 ¨ For objects smaller than a cache line, assume lines are full of aligned elements and reduce all of them counterLine void reduce[ADD] (int* counterLine, int[] deltas) { for (int i = 0; I < intsPerCacheLine; i++) { int v = load[ADD](counterLine[i]); deltas int nv = v + deltas[i]; store[ADD](counterLine[i], nv); } }

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend