Nexus: A New Approach to Replication in Distributed Shared Caches - PowerPoint PPT Presentation

Nexus allows replication even when read-only data cannot fit in the local bank Access latency (lower is better) Data fits in the local bank, 10 each thread owns 1 replica

Nexus allows replication even when read-only data cannot fit in the local bank Access latency (lower is better) Data fits in the local bank, 1 replica shared 10 each thread owns 1 replica by every 4 neighbors

Nexus allows replication even when read-only data cannot fit in the local bank Access latency (lower is better) 1 replica shared Data fits in the local bank, 1 replica shared by every 16 neighbors 10 each thread owns 1 replica by every 4 neighbors

Nexus allows replication even when read-only data cannot fit in the local bank Access latency (lower is better) 1 replica shared by all threads → Same as S-NUCA 1 replica shared Data fits in the local bank, 1 replica shared by every 16 neighbors 10 each thread owns 1 replica by every 4 neighbors

Nexus allows replication even when read-only data cannot fit in the local bank A significant latency Access latency (lower is better) reduction over prior work! 1 replica shared by all threads → Same as S-NUCA 1 replica shared Data fits in the local bank, 1 replica shared by every 16 neighbors 10 each thread owns 1 replica by every 4 neighbors

Recent directory-less dynamic NUCAs enable replication beyond the local bank Threads LLC data X Y Z 11

Recent directory-less dynamic NUCAs enable replication beyond the local bank Data placement is controlled using the virtual memory system and does not require a global directory Threads LLC data Core X Y TLB Z 11

Recent directory-less dynamic NUCAs enable replication beyond the local bank Data placement is controlled using the virtual memory system and does not require a global directory Threads LLC data X Core X Y Y TLB Z Z 11

Recent directory-less dynamic NUCAs enable replication beyond the local bank Data placement is controlled using the virtual memory system and does not require a global directory Threads LLC data X Core X Y Y TLB Z Z Data can be dynamically mapped to nearby banks and shared by arbitrary cores 11

The number of replicas ( replication degree ) is important Read-only Threads data (4MB) 16 MB LLC capacity 12

The number of replicas ( replication degree ) is important Replicating 4 times works best (4 x 4MB read-only = 16MB) Read-only Threads data (4MB) 16 MB LLC capacity 12

The number of replicas ( replication degree ) is important Replicating 4 times works best (4 x 4MB read-only = 16MB) Read-only Threads data (4MB) 16 MB LLC capacity Choosing how much to replicate is more important than choosing which lines to replicate 12

The number of replicas ( replication degree ) is important Read-only Threads data (1MB) Other data (8MB) 16 MB LLC capacity 13

The number of replicas ( replication degree ) is important Replicating 8 times works best (8 x 1MB read-only + 8MB other = 16MB) Read-only Threads data (1MB) Other data (8MB) 16 MB LLC capacity 13

The number of replicas ( replication degree ) is important Replicating 8 times works best (8 x 1MB read-only + 8MB other = 16MB) Read-only Threads data (1MB) Other data (8MB) 16 MB LLC capacity Too few replicas cause extra network traversals, while too many cause unnecessary cache misses 13

No adaptive replication in directory-less D-NUCAs Instructions Threads (read-only) Other data 14

No adaptive replication in directory-less D-NUCAs Reactive-NUCA (R-NUCA) [Hardavellas, ISCA 2009] always replicates instructions every 4 cores statically. Instructions Threads (read-only) Other data 14

No adaptive replication in directory-less D-NUCAs Reactive-NUCA (R-NUCA) [Hardavellas, ISCA 2009] always replicates instructions every 4 cores statically. Instructions Threads (read-only) Other data Other directory-less D-NUCAs do not replicate data 14

Workloads have different preferences to replication degrees  Study read-only data intensive workloads running on a 144-core system  Apply different replication degrees for all read-only data 15

Workloads have different preferences to replication degrees  Study read-only data intensive workloads running on a 144-core system  Apply different replication degrees for all read-only data Observation 1: Applications prefer different degrees, requiring an adaptive approach. 15

Workloads have different preferences to replication degrees  Study read-only data intensive workloads running on a 144-core system  Apply different replication degrees for all read-only data Observation 1: Applications prefer different degrees, requiring an adaptive approach. Observation 2: A few replication degrees suffice. 15

Nexus: enabling adaptive replication degrees in NUCA 16

Nexus: enabling adaptive replication degrees in NUCA  Builds on top of directory-less D-NUCAs  Read- only data’s on -chip location and coherence are tracked via the virtual memory system  Cores access and share closest replicas without directory overheads 16

Nexus: enabling adaptive replication degrees in NUCA  Builds on top of directory-less D-NUCAs  Read- only data’s on -chip location and coherence are tracked via the virtual memory system  Cores access and share closest replicas without directory overheads  Nexus-R builds on R-NUCA [Hardavellas, ISCA’09]  Supports flexible replication degrees for all read-only data  Leverages set-sampling to choose the best replication degree 16

Nexus: enabling adaptive replication degrees in NUCA  Builds on top of directory-less D-NUCAs  Read- only data’s on -chip location and coherence are tracked via the virtual memory system  Cores access and share closest replicas without directory overheads  Nexus-R builds on R-NUCA [Hardavellas, ISCA’09]  Supports flexible replication degrees for all read-only data  Leverages set-sampling to choose the best replication degree  Nexus-J builds on Jigsaw [PACT’13, HPCA’15]  Extends Jigsaw’s configuration algorithm to select the best replication degree  Outperforms Nexus-R in multi-program workloads 16

Nexus: enabling adaptive replication degrees in NUCA  Builds on top of directory-less D-NUCAs  Read- only data’s on -chip location and coherence are tracked via the virtual memory system  Cores access and share closest replicas without directory overheads Focus of this talk  Nexus-R builds on R-NUCA [Hardavellas, ISCA’09]  Supports flexible replication degrees for all read-only data  Leverages set-sampling to choose the best replication degree  Nexus-J builds on Jigsaw [PACT’13, HPCA’15]  Extends Jigsaw’s configuration algorithm to select the best replication degree  Outperforms Nexus-R in multi-program workloads 16

Nexus-R: Applying Nexus to R-NUCA 17

Nexus-R: Applying Nexus to R-NUCA  Nexus uses the virtual memory system to classify pages into three types.  Similar to R-NUCA, but differentiates all read-only data (not just instructions) 17

Nexus-R: Applying Nexus to R-NUCA  Nexus uses the virtual memory system to classify pages into three types.  Similar to R-NUCA, but differentiates all read-only data (not just instructions) Thread 0 Thread1 X Y Z Unknown Time 17

Nexus-R: Applying Nexus to R-NUCA  Nexus uses the virtual memory system to classify pages into three types.  Similar to R-NUCA, but differentiates all read-only data (not just instructions) Thread 0 Thread1 X Y Z Thread Shared Unknown Private Read-only Shared Read-write Time 17

Nexus-R: Applying Nexus to R-NUCA  Nexus uses the virtual memory system to classify pages into three types.  Similar to R-NUCA, but differentiates all read-only data (not just instructions) Thread 0 Thread1 Y Z X Thread Shared Read X Unknown Private Read-only First TLB miss Shared Read-write Time 17

Nexus-R: Applying Nexus to R-NUCA  Nexus uses the virtual memory system to classify pages into three types.  Similar to R-NUCA, but differentiates all read-only data (not just instructions) Thread 0 Thread1 Z X Y Thread Shared Read X Unknown Private Read-only First TLB miss Read Y Shared Read-write Time 17

Nexus-R: Applying Nexus to R-NUCA  Nexus uses the virtual memory system to classify pages into three types.  Similar to R-NUCA, but differentiates all read-only data (not just instructions) Thread 0 Thread1 Read TLB miss from other thread Z Y X Thread Shared Read X Unknown Private Read-only First TLB miss Read Y Read Y Shared Read-write Time 17

Nexus-R: Applying Nexus to R-NUCA  Nexus uses the virtual memory system to classify pages into three types.  Similar to R-NUCA, but differentiates all read-only data (not just instructions) Thread 0 Thread1 Read TLB miss from other thread Z Y X Thread Shared Read X Unknown Private Read-only Nexus-R First TLB miss Read Y replicates this Read Y Shared Read-write Time 17

Nexus-R: Applying Nexus to R-NUCA  Nexus uses the virtual memory system to classify pages into three types.  Similar to R-NUCA, but differentiates all read-only data (not just instructions) Thread 0 Thread1 Read TLB miss from other thread Y X Z Thread Shared Read X Unknown Private Read-only Nexus-R First TLB miss Read Y replicates this Read Y Read Z Shared Read-write Time 17

Nexus-R: Applying Nexus to R-NUCA  Nexus uses the virtual memory system to classify pages into three types.  Similar to R-NUCA, but differentiates all read-only data (not just instructions) Thread 0 Thread1 Read TLB miss from other thread Y X Thread Shared Read X Unknown Private Read-only Nexus-R First TLB miss Read Y Write TLB miss replicates this Read Y from other thread Read Z Z Write Z Shared Read-write Time 17

Nexus-R: Applying Nexus to R-NUCA 18

Nexus-R: Applying Nexus to R-NUCA  Supports flexible replication degrees via flexible cluster sizes  R-NUCA always uses the cluster size of 4; Nexus-R supports reconfigurable sizes 18

Nexus-R: Applying Nexus to R-NUCA  Supports flexible replication degrees via flexible cluster sizes  R-NUCA always uses the cluster size of 4; Nexus-R supports reconfigurable sizes Private data: Always local 18

Nexus-R: Applying Nexus to R-NUCA  Supports flexible replication degrees via flexible cluster sizes  R-NUCA always uses the cluster size of 4; Nexus-R supports reconfigurable sizes Private data: Always local Shared read-write data: Always like S-NUCA 18

Nexus-R: Applying Nexus to R-NUCA  Supports flexible replication degrees via flexible cluster sizes  R-NUCA always uses the cluster size of 4; Nexus-R supports reconfigurable sizes Replication degree of 9 on 36 cores → cluster with size of 4 (36 divided by 9) Shared read-only data: Replicated clusters Private data: Always local Shared read-write data: Always like S-NUCA 18

Nexus-R: Applying Nexus to R-NUCA  Supports flexible replication degrees via flexible cluster sizes  R-NUCA always uses the cluster size of 4; Nexus-R supports reconfigurable sizes Replication degree of 9 on 36 cores → cluster with size of 4 (36 divided by 9) Shared read-only data: Replicated clusters 18

Nexus-R: Applying Nexus to R-NUCA  Supports flexible replication degrees via flexible cluster sizes  R-NUCA always uses the cluster size of 4; Nexus-R supports reconfigurable sizes Replication degree of 9 on 36 cores → Replication degree of 4 → cluster with size of 4 (36 divided by 9) cluster with size of 9 Shared read-only data: Replicated clusters 18

Nexus-R leverages set-sampling to select the best degree 19

Nexus-R leverages set-sampling to select the best degree  Enhances set-sampling to monitor the performance of different degrees 19

Nexus-R leverages set-sampling to select the best degree  Enhances set-sampling to monitor the performance of different degrees  Compares the cumulative access latency of each degree from sampled sets 19

Nexus-R leverages set-sampling to select the best degree  Enhances set-sampling to monitor the performance of different degrees  Compares the cumulative access latency of each degree from sampled sets L1s Core 19

Nexus-R leverages set-sampling to select the best degree  Enhances set-sampling to monitor the performance of different degrees  Compares the cumulative access latency of each degree from sampled sets 1. L1 Miss Address to Bank/Set Lookup Logic L1s Core 19

Nexus-R leverages set-sampling to select the best degree  Enhances set-sampling to monitor the performance of different degrees  Compares the cumulative access latency of each degree from sampled sets 1. L1 Miss Address to Bank/Set Lookup Logic 2. Sampled access for degree of 4 L1s MSHR Core 19

Nexus-R leverages set-sampling to select the best degree  Enhances set-sampling to monitor the performance of different degrees  Compares the cumulative access latency of each degree from sampled sets 1. L1 Miss Address to Bank/Set Lookup Logic 2. Sampled access for degree of 4 L1s MSHR 3. Sampled access returns Latency X Core 19

Nexus-R leverages set-sampling to select the best degree  Enhances set-sampling to monitor the performance of different degrees  Compares the cumulative access latency of each degree from sampled sets 1. L1 Miss Address to Bank/Set Lookup Logic 2. Sampled access for degree of 4 L1s MSHR 3. Sampled access returns Latency X Counters record the latency Core 1/4 1/9 1/36 4/9 4/36 9/36 difference between degrees 19

Nexus-R leverages set-sampling to select the best degree  Enhances set-sampling to monitor the performance of different degrees  Compares the cumulative access latency of each degree from sampled sets 1. L1 Miss Address to Bank/Set Lookup Logic 2. Sampled access for degree of 4 L1s MSHR 3. Sampled access returns Latency X +X -X -X Counters record the latency Core 4. Update counters 1/4 1/9 1/36 4/9 4/36 9/36 difference between degrees 19

Nexus: A New Approach to Replication in Distributed Shared Caches - PowerPoint PPT Presentation

Nexus: A New Approach to Replication in Distributed Shared Caches Po-An Tsai , Nathan Beckmann, and Daniel Sanchez Executive summary 2 Executive summary Data replication reduces the access latency of non-uniform caches (NUCA) But

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

MySQL Replication Tutorial Mats Kindahl Senior Software Engineer Replication Technology Lars

August 23, 2012 Data Replication/ETL: Terms Data Replication : Data Replication is the process of

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Nexus Maximus IV September 8 11 th , 2017 Goals A signature Nexus Learning experience

GOOGLE NEXUS 5X & 6P MAINLINING EFFORT JEREMY MCNICOLL JEREMYMC@REDHAT.COM $> WHOAMI 2

Water Energy Nexus Energy sector Water sector What is energy-water nexus? Milad

Todays Topics - Chapter 15 Slide 1 performance enhancement Replication Replication of

New features in MySQL Replication Lars Thalmann, Development Manager, Replication & Backup

Consistency and Replication Chi Zhang czhang@cs.fiu.edu Object Replication (1) Organization of

NEXUS Medical Devices USC I nitial Market Projections Definition of Medical Devices NEXUS

The NEXus Collaborative and PhD/DNP Education Presenters : Paula McNeil, RN, MS NEXus Project

The State of NeXus . Tobias Richter Diamond Light Source and NeXus International Advisory

Replication and Migration Background, Requirements and Strawman Migration and Replication

DISTRIBUTED SYSTEMS II REPLICATION CNT. II The Quorum consensus method for Replication To

Water Energy Nexus The life of a drop of water MSD System Overview & Details Two

How Wayfair Changed Everything but also Nothing May 17, 2019 Jared Walczak Senior Policy

RAC Webinar: Welcome Baby Overview and Current Studies April 17, 2014 F5LA R&E Home Visiting

Genomics and Ethics in Research and Medical Decision-Making Ethical Challenges of Biobanks in

Pwning the Nexus of Every Pixel Qidan He Gengming Liu CanSecWest 2017 Vancouver

Sales Taxes in an e-Commerce Generation David R. Agrawal (University of Kentucky) and William F.

How can practice theory inform interventions into the domestic nexus? Dr. Daniel Welch

The Nexus of Identity IBM's submission for the W3C Identity in the Browser Workshop by

Nexus: A New Approach to Replication in Distributed Shared Caches - PowerPoint PPT Presentation

Nexus: A New Approach to Replication in Distributed Shared Caches Po-An Tsai , Nathan Beckmann, and Daniel Sanchez Executive summary 2 Executive summary Data replication reduces the access latency of non-uniform caches (NUCA) But

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

MySQL Replication Tutorial Mats Kindahl Senior Software Engineer Replication Technology Lars

August 23, 2012 Data Replication/ETL: Terms Data Replication : Data Replication is the process of

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Nexus Maximus IV September 8 11 th , 2017 Goals A signature Nexus Learning experience

GOOGLE NEXUS 5X &amp; 6P MAINLINING EFFORT JEREMY MCNICOLL JEREMYMC@REDHAT.COM $&gt; WHOAMI 2

Water Energy Nexus Energy sector Water sector What is energy-water nexus? Milad

Todays Topics - Chapter 15 Slide 1 performance enhancement Replication Replication of

New features in MySQL Replication Lars Thalmann, Development Manager, Replication &amp; Backup

Consistency and Replication Chi Zhang czhang@cs.fiu.edu Object Replication (1) Organization of

NEXUS Medical Devices USC I nitial Market Projections Definition of Medical Devices NEXUS

The NEXus Collaborative and PhD/DNP Education Presenters : Paula McNeil, RN, MS NEXus Project

The State of NeXus . Tobias Richter Diamond Light Source and NeXus International Advisory

Replication and Migration Background, Requirements and Strawman Migration and Replication

DISTRIBUTED SYSTEMS II REPLICATION CNT. II The Quorum consensus method for Replication To

Water Energy Nexus The life of a drop of water MSD System Overview &amp; Details Two

How Wayfair Changed Everything but also Nothing May 17, 2019 Jared Walczak Senior Policy

RAC Webinar: Welcome Baby Overview and Current Studies April 17, 2014 F5LA R&amp;E Home Visiting

Genomics and Ethics in Research and Medical Decision-Making Ethical Challenges of Biobanks in

Pwning the Nexus of Every Pixel Qidan He Gengming Liu CanSecWest 2017 Vancouver

Sales Taxes in an e-Commerce Generation David R. Agrawal (University of Kentucky) and William F.

How can practice theory inform interventions into the domestic nexus? Dr. Daniel Welch

The Nexus of Identity IBM's submission for the W3C Identity in the Browser Workshop by

GOOGLE NEXUS 5X & 6P MAINLINING EFFORT JEREMY MCNICOLL JEREMYMC@REDHAT.COM $> WHOAMI 2

New features in MySQL Replication Lars Thalmann, Development Manager, Replication & Backup

Water Energy Nexus The life of a drop of water MSD System Overview & Details Two

RAC Webinar: Welcome Baby Overview and Current Studies April 17, 2014 F5LA R&E Home Visiting