E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G - PowerPoint PPT Presentation

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G UOWEI Z HANG , V IRGINIA C HIU , D ANIEL S ANCHEZ MICRO 2016

Executive summary 2 ¨ Exploiting commutativity benefits update-heavy apps ¤ Software techniques that exploit commutativity incur high run- time overheads (STM is 2-6x slower than HTM) ¤ Prior hardware exploits only single-instruction commutative operations (e.g., addition) ¨ CommTM exploits multi-instruction commutativity ¤ Extends coherence protocol to perform commutative operations locally and concurrently ¤ Leverages HTM to support multi-instruction updates ¤ Benefits speculative execution by reducing conflicts ¤ Accelerates full applications by up to 3.4x at 128 cores

Commutativity 3 ¨ Commutative operations produce equivalent results when reordered ¤ No true data dependence à No need for communication ¤ Software exploits commutativity but incurs high run-time overheads Multi-instruction commutativity Single-instruction Top-K insertion commutativity Set insertion ADD MIN OR Ordered put Coup CommTM [Zhang et al, MICRO 2015]

Commutativity 4 ¨ Commutative operations produce equivalent results when reordered ¤ No true data dependence à No need for communication ¤ Software exploits commutativity but incurs high run-time overheads ¤ Multi-instruction example: set (linked-list) insertion head null insert( a ); insert( b ); a b head null insert( b ); insert( a ); b a head null Different but semantically equivalent states

Example: addition in conventional HTM 6 void add (int* counter, int delta) { tx_begin(); int v = load(counter) ; int nv = v + delta; store(counter, nv) ; tx_end(); }

Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read read int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read write read Conflict! int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read write int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t commit Core 0 Core 1 load A restart add(A, 1): Txn 2 load A store A add(A, 1); add(A, 1); abort commit add(A, 1); add(A, 1); add(A, 1): Txn 3 restart load A load A store A a b o r t commit restart load A store A commit

Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t commit Core 0 Core 1 load A restart add(A, 1): Txn 2 load A store A add(A, 1); add(A, 1); abort commit add(A, 1); add(A, 1); add(A, 1): Txn 3 restart load A load A store A a b o r t commit Traffic restart load A Serialization store A commit Wasted transactional work

Example: addition in CommTM 7 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

Example: addition in CommTM 7 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; read read int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; read write read write int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 21 A: 1 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 22 A: 2 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit

Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 22 A: 2 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A

Example: addition in CommTM 7 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; reduction Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A User-defined reduction

Example: addition in CommTM 7 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; reduction Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A Less traffic User-defined reduction Concurrent updates Less wasted transactional work Less run-time/memory overheads than STM

CommTM

Programming interface 9 Transactional update void add (int* counter, int delta) { tx_begin(); int v = load[ADD](counter) ; Labeled loads/stores int nv = v + delta; store[ADD](counter, nv) ; tx_end(); } Non-transactional reduction handler counter 16 + void reduce[ADD] (int* counter, int delta) { int v = load[ADD](counter); 20 delta int nv = v + delta; store[ADD](counter, nv); reduce[ADD] } 36 counter

Handling arbitrary object sizes 10 ¨ For objects smaller than a cache line, assume lines are full of aligned elements and reduce all of them void reduce[ADD] (int* counterLine, int[] deltas) { for (int i = 0; I < intsPerCacheLine; i++) { int v = load[ADD](counterLine[i]); int nv = v + deltas[i]; store[ADD](counterLine[i], nv); } }

Handling arbitrary object sizes 10 ¨ For objects smaller than a cache line, assume lines are full of aligned elements and reduce all of them counterLine void reduce[ADD] (int* counterLine, int[] deltas) { for (int i = 0; I < intsPerCacheLine; i++) { int v = load[ADD](counterLine[i]); deltas int nv = v + deltas[i]; store[ADD](counterLine[i], nv); } }

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G - PowerPoint PPT Presentation

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G UOWEI Z HANG , V IRGINIA C HIU , D ANIEL S ANCHEZ MICRO 2016 Executive summary 2 Exploiting commutativity benefits update-heavy apps Software techniques that exploit

S emantic Web Architecture Vitaly Vlasov inxaoc@ gmail.com Agenda 1. About S emantic Web,

THE C OST OF U PDATES TO S HARED D ATA IN C ACHE -C OHERENT S YSTEMS G UOWEI Z HANG , W EBB H ORN ,

HARDWARE H ARDWARE T YPES Microcontroller (MCU) Arduino, ESP8266, Particle Single Board

H ARDWARE P REPROCESSING F RAMEWORK (HPF) Traditional hardware Hardware preprocessing description

T IME T RAVELING H ARDWARE AND S OFTWARE S YSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT F OR F

M ULTICORE H ARDWARE S HARED R ESOURCES : U NDERSTANDING OF THE S TATE OF THE A RT Gabriel

S emantic A utomated D iscovery and I ntegration http://sadiframework.org Summary SADI is a

S EMANTIC -B ASED M ULTILINGUAL D OCUMENT C LUSTERING VIA T ENSOR M ODELING Salvatore Romeo 1 ,

WAVES B IG D ATA P LATFORM FOR R EAL - TIME S EMANTIC S TREAM M ANAGEMENT WAVES ATOS SE OUTLINE

Natural S emantics Based Tools for S emantic Web with Application to Product Models CUGS

A PPLICATION : S EARCH IN T OURISM (S KY S CANNER ) Goal: search for hotels/flights/trips using

ISO-T IME ML: A N I NTERNATIONAL S TANDARD FOR S EMANTIC ANNOTATION James Pustejovsky*, Kiyong

S EM F IX : P ROGRAM R EPAIR VIA S EMANTIC A NALYSIS CREST Workshop, Jan 2014 H.D.T. Nguyen,

S EMANTIC S OLUTIONS FOR O IL & G AS : R OLES AND R ESPONSIBILITIES

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING

S EARCH AND S EMANTIC S EARCH Indian Institute of Technology Kanpur Commonwealth of Learning

BinRec: Dynamic Binary Lifting and Recompilation Anil Altinay , Joseph Nash , Taddeus

CPSC 213 Introduction to Computer Systems Unit 2c Synchronization 1 Reading Companion 6

make world Chris Smowton University of Cambridge spell-rite /usr/share/real_words ~/nonsense

Objects (cont.) Deian Stefan (Adopted from my & Edward Yangs CS242 slides) Today

CS 241: Systems Programming Lecture 25. Function Pointers Spring 2020 Prof. Stephen Checkoway 1

DescribingLinkedDatasets OntheDesignandUsageof voiD ,

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Roeder A First Program Using brings in a namespace, which is an using System; abstract container

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G - PowerPoint PPT Presentation

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G UOWEI Z HANG , V IRGINIA C HIU , D ANIEL S ANCHEZ MICRO 2016 Executive summary 2 Exploiting commutativity benefits update-heavy apps Software techniques that exploit

S emantic Web Architecture Vitaly Vlasov inxaoc@ gmail.com Agenda 1. About S emantic Web,

THE C OST OF U PDATES TO S HARED D ATA IN C ACHE -C OHERENT S YSTEMS G UOWEI Z HANG , W EBB H ORN ,

HARDWARE H ARDWARE T YPES Microcontroller (MCU) Arduino, ESP8266, Particle Single Board

H ARDWARE P REPROCESSING F RAMEWORK (HPF) Traditional hardware Hardware preprocessing description

T IME T RAVELING H ARDWARE AND S OFTWARE S YSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT F OR F

M ULTICORE H ARDWARE S HARED R ESOURCES : U NDERSTANDING OF THE S TATE OF THE A RT Gabriel

S emantic A utomated D iscovery and I ntegration http://sadiframework.org Summary SADI is a

S EMANTIC -B ASED M ULTILINGUAL D OCUMENT C LUSTERING VIA T ENSOR M ODELING Salvatore Romeo 1 ,

WAVES B IG D ATA P LATFORM FOR R EAL - TIME S EMANTIC S TREAM M ANAGEMENT WAVES ATOS SE OUTLINE

Natural S emantics Based Tools for S emantic Web with Application to Product Models CUGS

A PPLICATION : S EARCH IN T OURISM (S KY S CANNER ) Goal: search for hotels/flights/trips using

ISO-T IME ML: A N I NTERNATIONAL S TANDARD FOR S EMANTIC ANNOTATION James Pustejovsky*, Kiyong

S EM F IX : P ROGRAM R EPAIR VIA S EMANTIC A NALYSIS CREST Workshop, Jan 2014 H.D.T. Nguyen,

S EMANTIC S OLUTIONS FOR O IL &amp; G AS : R OLES AND R ESPONSIBILITIES

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING

S EARCH AND S EMANTIC S EARCH Indian Institute of Technology Kanpur Commonwealth of Learning

BinRec: Dynamic Binary Lifting and Recompilation Anil Altinay , Joseph Nash , Taddeus

CPSC 213 Introduction to Computer Systems Unit 2c Synchronization 1 Reading Companion 6

make world Chris Smowton University of Cambridge spell-rite /usr/share/real_words ~/nonsense

Objects (cont.) Deian Stefan (Adopted from my &amp; Edward Yangs CS242 slides) Today

CS 241: Systems Programming Lecture 25. Function Pointers Spring 2020 Prof. Stephen Checkoway 1

DescribingLinkedDatasets OntheDesignandUsageof voiD ,

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Roeder A First Program Using brings in a namespace, which is an using System; abstract container

S EMANTIC S OLUTIONS FOR O IL & G AS : R OLES AND R ESPONSIBILITIES

Objects (cont.) Deian Stefan (Adopted from my & Edward Yangs CS242 slides) Today