THE C OST OF U PDATES TO S HARED D ATA IN C ACHE -C OHERENT S YSTEMS - PowerPoint PPT Presentation

E XPLOITING C OMMUTATIVITY TO R EDUCE THE C OST OF U PDATES TO S HARED D ATA IN C ACHE -C OHERENT S YSTEMS G UOWEI Z HANG , W EBB H ORN , D ANIEL S ANCHEZ MICRO 2015

Executive summary 2  Updates to shared data limit parallelism in current systems  Insight: Many updates are commutative  Coup extends cache coherence protocols to make commutative updates as cheap as reads  Maintains coherence and consistency  Accelerates update-heavy applications significantly

Updates are expensive 3 Shared cache Core 0 Core 1 A: 20 add(A, 1); add(A, 2); add(A, 1); add(A, 2); add(A, 1); add(A, 2); +2 +1 read(A); A: 21 A: 23 Core/$ 0 Core/$ 1 Time Traffic Serialization

Updates are expensive, even with RMOs 4 Shared cache Core 0 Core 1 ALU A: 23 A: 21 A: 20 add(A, 1); add(A, 2); +1 +2 add(A, 1); add(A, 2); add(A, 1); add(A, 2); read(A); Core/$ 0 Core/$ 1 Traffic Time Serialization Complicates consistency

Coup: exploiting commutativity 5 Shared cache Core 0 Core 1 ALU A: 20 A: 23 A: 29 add(A, 1); add(A, 2); add(A, 1); add(A, 2); add(A, 1); add(A, 2); +2 +1 read(A); A: +0 A: +1 A: +0 A: +2 Core/$ 0 Core/$ 1 Low traffic Time Concurrent updates Simple consistency Less general than RMOs

Commutative updates are common 6  Operations  Applications Reduction variables Iterative algorithms Graph traversal Reference counting

Software privatization vs. Coup 7 X.0 Privatization X.1 X … … Reduction X.N Multiple thread-private, One read-only copy update-only copies Software privatization Coup Needs to amortize No overheads privatization/reduction costs Wastes shared cache & No wasted capacity memory capacity Must apply selectively Apply to any update that might commute

Outline 8  Introduction  Coup  Evaluation

Structural changes 9 Reduction Shared cache/dir unit Coherence states Private Private Cache 0 … … U M S I Cache N-1 ISA … Core 0 Core N-1 … comm_add (&x, v) load (&x) comm_or (&x, v) Store (&x, v) … ...

Example: extending MSI 10 M M MSI MUSI W W W R R C C S S U W W W W R W, C W, R R W R C I I Legend Initiated by own core (gain permissions) Transitions Initiated by others (lose permissions) States M odified S hared (read-only) I nvalid U pdate-only Requests R ead W rite C ommutative update

Coherence and consistency 11  Coherence is maintained  Consistency is not affected  See paper for proofs

Implementation and verification 12 IM IM Legend xMS xMN SM NM States Stable Transient IS IN Split Race M M M E E N S S I I IM xMI M E I Transitions initiated by ISI xNI Own request (R,W,C,wback) NN xMI xMI WB WB WBI WBI Response to own request Inval/downgrade request No extra stable states Easy to verify

Evaluation Methodology 13 to L4 chips … L4 cache L4 cache L4 cache … Shared L3 and chip directory & global & global & global dir chip dir chip dir chip L2 0 L2 15 … L1I L1D L1I L1D Processor Processor Processor … chip chip chip Core 0 Core 15 1-8 processor and L4 chips Processor chip organization 8 sockets × 16 cores/socket = 128 cores

Coup vs. Atomic Operations 14 MESI COUP histogram pagerank bfs fluidanimate spmv 60 60 25 25 70 70 100 100 50 50 50 50 60 60 20 20 80 80 40 40 Speedup Speedup 50 50 40 40 15 15 60 60 40 40 30 30 30 30 30 30 10 10 40 40 20 20 20 20 20 20 5 5 20 20 10 10 10 10 10 10 0 0 0 0 0 0 0 0 0 0 1 1 32 32 64 64 96 96 128 128 1 1 32 32 64 64 96 96 128 128 1 1 32 32 64 64 96 96 128 128 1 1 32 32 64 64 96 96 128 128 1 1 32 32 64 64 96 96 128 128 Cores Cores Cores Cores Cores Cores Cores Cores Cores Cores Fraction of commutative instructions 1.0% 2.4% 4.9% 0.40% 0.96% MESI COUP 1.2 Normalized AMAT 1 0.8 0.6 0.4 0.2 0 histogram spmv pagerank bfs fluidanimate

Modifying algorithms to exploit Coup 15 Delayed deallocation reference counting Scheme Data structure Refcache [1] Hash table Coup implementation Hierarchical bit vectors + comm add/or 2.5 2 Performance 1.5 1 0.5 0 Refcache Coup [1] Clements et al, EuroSys 2013

Conclusions 16  Coup allows concurrent commutative updates  Maintains coherence and consistency  Coup implementation accelerates single-word updates  Minor hardware overhead  Accelerates update-heavy applications by up to 2.4x  Coup opens exciting research avenues  Commutativity-aware hardware transactional memory  Support arbitrary update functions, semantic commutativity

T HANKS F OR Y OUR A TTENTION ! Q UESTIONS ARE WELCOME !

THE C OST OF U PDATES TO S HARED D ATA IN C ACHE -C OHERENT S YSTEMS - PowerPoint PPT Presentation

E XPLOITING C OMMUTATIVITY TO R EDUCE THE C OST OF U PDATES TO S HARED D ATA IN C ACHE -C OHERENT S YSTEMS G UOWEI Z HANG , W EBB H ORN , D ANIEL S ANCHEZ MICRO 2015 Executive summary 2 Updates to shared data limit parallelism in current

Why EVs are key to your biz strategy now Beln Gallego ATA Insights belen.gallego@ata.email

D ATA S CIENCE E COSYSTEM M. T AMER ZSU N ANCY R EID R AYMOND N G U. W ATERLOO U. T ORONTO UBC

Engineering November 2, 2009 Innovative Solutions Through Test and Analysis-Driven Design ATA

OST OST UNITED STATES DEPARTMENT OF THE INTERIOR UNITED STATES DEPARTMENT OF THE --- INTERIOR

Lectur Lecture 20: e 20: DC M DC Motor otors Exam Exam 2 Results 2 Results Most M ost

Energy Performance o Buildings Direc Packages I III 4 5 6 3 1 C VG (T): G lobal E

2014 H EALTH C ARE C OST T RENDS H EARING P ANEL 1 M EETING THE C OST G ROWTH B ENCHMARK P ANEL 2 A

OST-HMD Optical Medium LCD, DLP, etc. Eye (camera) OST-HMD Eye (camera) 1 2 3 4 5 6 7

C ANDIDATE P ROJECTS : D ATA C OLLECTION Long-Range Transportation Plan Subcommittee June 3, 2020

Architect: Bahgat Sabry TRIPLE S S hared S pace S ystem EVERY MINUTE MATTERS For the first time

What is a PASSE? The P rovider-led A rkansas S hared S avings E ntity (PASSE) is a model of

M ULTICORE H ARDWARE S HARED R ESOURCES : U NDERSTANDING OF THE S TATE OF THE A RT Gabriel

MEETING PR9 2Q2 Q2019 019 Resu sult lts August 27, 2019 Agenda 1 Overview rview 2

L AO P D R: U PDATES ON TOURIST ARRIVALS AND BORDER CHECK POINTS T

Hous Housing C ing Couns ounseling eling Cer Certifica tification U tion Upda pdates and t

Chair, ATA Organizing Committee Chair, ATA Scientific Committee Senior National Program Leader

Its not just a cupcake its a Cupcakerie cupcake!!! W W W . T H E C U P C A K E R I

Small Number is a 5 year-old boy who gets into a lot of mischief. He lives with his Grandma and

Overabundance as hybrid infmection Quantitative evidence from Czech Matas Guzmn Naranjo and

Modeling nuclear effects Modeling nuclear effects in precise oscillation experiments in precise

CSEP 517 Natural Language Processing Frame Semantics Luke Zettlemoyer Slides adapted from Yejin

Module 7 Understanding the Modernized Review System Topics Covered in This Module

HOP: Hardware makes Obfuscation Practical Kartik Nayak With Chris Fletcher, Ling Ren, Nishanth

Wate r Supply Plan August 21, 2014 1 Statutory Requirement for Water Supply Plan A.R.S.