Motivation Memory is a shared resource Core Core Memory Core - PowerPoint PPT Presentation

Thread Cluster Memory Scheduling : Exploiting Differences in Memory Access Behavior Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter

Motivation • Memory is a shared resource Core Core Memory Core Core • Threads’ requests contend for memory – Degradation in single thread performance – Can even lead to starvation • How to schedule memory requests to increase both system throughput and fairness? 2

Previous Scheduling Algorithms are Biased 17 System throughput Maximum Slowdown 15 Better fairness bias 13 FRFCFS 11 STFM 9 PAR-BS 7 ATLAS Fairness 5 bias 3 1 8 8.2 8.4 8.6 8.8 9 Weighted Speedup Better system throughput No previous memory scheduling algorithm provides both the best fairness and system throughput 3

Why do Previous Algorithms Fail? Throughput biased approach Fairness biased approach Prioritize less memory-intensive threads Take turns accessing memory Good for throughput Does not starve thread A less memory higher thread B thread C thread B thread A intensive priority thread C not prioritized  starvation  unfairness reduced throughput Single policy for all threads is insufficient 4

Insight: Achieving Best of Both Worlds For Throughput higher priority Prioritize memory-non-intensive threads thread thread thread For Fairness thread Unfairness caused by memory-intensive thread being prioritized over each other thread • Shuffle threads thread Memory-intensive threads have thread different vulnerability to interference • Shuffle asymmetrically 5

Outline  Motivation & Insights  Overview  Algorithm  Bringing it All Together  Evaluation  Conclusion 6

Overview: Thread Cluster Memory Scheduling 1. Group threads into two clusters 2. Prioritize non-intensive cluster higher 3. Different policies for each cluster priority Non-intensive Memory-non-intensive cluster Throughput thread thread thread thread thread thread higher Prioritized priority thread thread thread thread Threads in the system Memory-intensive Intensive cluster Fairness 7

TCM Outline 1. Clustering 9

Clustering Threads Step1 Sort threads by MPKI (misses per kiloinstruction) higher thread thread MPKI thread thread thread thread Non-intensive Intensive α T cluster cluster T α < 10% T = Total memory bandwidth usage ClusterThreshold Step2 Memory bandwidth usage α T divides clusters 10

TCM Outline 1. Clustering 2. Between Clusters 11

Prioritization Between Clusters Prioritize non-intensive cluster > priority • Increases system throughput – Non-intensive threads have greater potential for making progress • Does not degrade fairness – Non-intensive threads are “light” – Rarely interfere with intensive threads 12

TCM Outline 3. Non-Intensive Cluster 1. Clustering Throughput 2. Between Clusters 13

Non-Intensive Cluster Prioritize threads according to MPKI higher lowest MPKI priority thread thread thread thread highest MPKI • Increases system throughput – Least intensive thread has the greatest potential for making progress in the processor 14

TCM Outline 3. Non-Intensive Cluster 1. Clustering Throughput 2. Between 4. Intensive Clusters Cluster Fairness 15

Intensive Cluster Periodically shuffle the priority of threads higher Most prioritized priority thread thread Increases fairness thread • Is treating all threads equally good enough? • BUT: Equal turns ≠ Same slowdown 16

Case Study: A Tale of Two Threads Case Study: Two intensive threads contending 1. random-access Which is slowed down more easily? 2. streaming Prioritize random-access Prioritize streaming 14 14 11x 12 12 Slowdown Slowdown 10 10 7x 8 8 6 6 prioritized prioritized 4 4 1x 1x 2 2 0 0 random-access streaming random-access streaming random-access thread is more easily slowed down 17

Why are Threads Different? rows Memory Bank 1 Bank 2 Bank 3 Bank 4 18

Why are Threads Different? random-access activated row req rows req req req Memory Bank 1 Bank 2 Bank 3 Bank 4 • All requests parallel • High bank-level parallelism 19

Why are Threads Different? random-access streaming activated row req req req rows req Memory Bank 1 Bank 2 Bank 3 Bank 4 • All requests parallel • All requests  Same row • High bank-level parallelism • High row-buffer locality 20

Why are Threads Different? random-access streaming stuck req req req req req rows req req req Memory Bank 1 Bank 2 Bank 3 Bank 4 • All requests parallel • All requests  Same row • High bank-level parallelism • High row-buffer locality Vulnerable to interference 21

Niceness How to quantify difference between threads? Niceness High Low Bank-level parallelism Row-buffer locality Vulnerability to interference Causes interference Niceness + - 23

Shuffling: Round-Robin vs. Niceness-Aware  What can go wrong? 1. Round-Robin shuffling 2. Niceness-Aware shuffling 24

Shuffling: Round-Robin vs. Niceness-Aware  What can go wrong? 1. Round-Robin shuffling 2. Niceness-Aware shuffling GOOD: Each thread prioritized once Most prioritized D A B C D Priority D A B C D Nice thread C D A B C B C D A B Least nice thread A B C D A Time ShuffleInterval 25

Shuffling: Round-Robin vs. Niceness-Aware  What can go wrong? 1. Round-Robin shuffling 2. Niceness-Aware shuffling GOOD: Each thread prioritized once Most prioritized D A B C D Priority D A B C D Nice thread C D A B C B C D A B Least nice thread A B C D A Time BAD: Nice threads receive ShuffleInterval lots of interference 26

Shuffling: Round-Robin vs. Niceness-Aware 1. Round-Robin shuffling 2. Niceness-Aware shuffling 27

Shuffling: Round-Robin vs. Niceness-Aware 1. Round-Robin shuffling 2. Niceness-Aware shuffling GOOD: Each thread prioritized once Most prioritized D C B A D Priority D D B A D Nice thread C C C B C B B D C B Least nice thread A A A D A Time ShuffleInterval 28

Shuffling: Round-Robin vs. Niceness-Aware 1. Round-Robin shuffling 2. Niceness-Aware shuffling GOOD: Each thread prioritized once Most prioritized D C B A D Priority D D B A D Nice thread C C C B C B B D C B Least nice thread A A A D A Time GOOD: Least nice thread stays ShuffleInterval mostly deprioritized 29

Quantum-Based Operation Previous quantum Current quantum (~1M cycles) (~1M cycles) Time Shuffle interval During quantum: (~1K cycles) • Monitor thread behavior 1. Memory intensity Beginning of quantum : 2. Bank-level parallelism • Perform clustering 3. Row-buffer locality • Compute niceness of intensive threads 32

TCM Scheduling Algorithm 1. Highest-rank : Requests from higher ranked threads prioritized • Non-Intensive cluster > Intensive cluster • Non-Intensive cluster: l ower intensity  higher rank • Intensive cluster: r ank shuffling 2.Row-hit : Row-buffer hit requests are prioritized 3.Oldest : Older requests are prioritized 33

Implementation Costs Required storage at memory controller (24 cores) Thread memory behavior Storage MPKI ~0.2kb Bank-level parallelism ~0.6kb Row-buffer locality ~2.9kb Total < 4kbits • No computation is on the critical path 34

Outline  Motivation & Insights  Overview  Algorithm Throughput Fairness  Bringing it All Together  Evaluation  Conclusion 35

Metrics & Methodology • Metrics System throughput Unfairness shared alone IPC IPC    i i Weighted Speedup Maximum Slowdown max i IPC alone shared IPC i i i • Methodology – Core model • 4 GHz processor, 128-entry instruction window • 512 KB/core L2 cache – Memory model: DDR2 – 96 multiprogrammed SPEC CPU2006 workloads 36

Previous Work FRFCFS [Rixner et al., ISCA00]: Prioritizes row-buffer hits – Thread-oblivious  Low throughput & Low fairness STFM [Mutlu et al., MICRO07]: Equalizes thread slowdowns – Non-intensive threads not prioritized  Low throughput PAR-BS [Mutlu et al., ISCA08]: Prioritizes oldest batch of requests while preserving bank-level parallelism – Non-intensive threads not always prioritized  Low throughput ATLAS [Kim et al., HPCA10]: Prioritizes threads with less memory service – Most intensive thread starves  Low fairness 37

Results: Fairness vs. Throughput Averaged over 96 workloads 16 FRFCFS Better fairness Maximum Slowdown 14 ATLAS 5% 12 STFM 10 39% PAR-BS 8 TCM 5% 6 8% 4 7.5 8 8.5 9 9.5 10 Weighted Speedup Better system throughput TCM provides best fairness and system throughput 38

Motivation Memory is a shared resource Core Core Memory Core - PowerPoint PPT Presentation

Thread Cluster Memory Scheduling : Exploiting Differences in Memory Access Behavior Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter Motivation Memory is a shared resource Core Core Memory Core Core Threads requests

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

Welcome to 2017 Analyst Day September 21, 2017 Safe Harbor for Forward-Looking Statements This

Membranous Nephropathy: The clinical syndrome and risk markers of progression

COMPOSITE ARTIFICIAL WING MIMICKING A BEETLE HIND- WING Q.V. Nguyen 1 , N.S. Ha 1 , H.C. Park 1* ,

Familial Membranous Nephropathy D. Bockenhauer R. Kleta Gene identification Positional

Pivotal Memory Technologies Enabling New Generation of AI Workloads Tien Shiah Memory Product

If each of these people said this, who would you trust? Blueberries are great they can

Brief Structure of Symposium Presentation Havent been able to do a draft because I am still

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

Motivation Memory is a shared resource Core Core Memory Core - PowerPoint PPT Presentation

Thread Cluster Memory Scheduling : Exploiting Differences in Memory Access Behavior Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter Motivation Memory is a shared resource Core Core Memory Core Core Threads requests

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&amp;M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory &amp; practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

Welcome to 2017 Analyst Day September 21, 2017 Safe Harbor for Forward-Looking Statements This

Membranous Nephropathy: The clinical syndrome and risk markers of progression

COMPOSITE ARTIFICIAL WING MIMICKING A BEETLE HIND- WING Q.V. Nguyen 1 , N.S. Ha 1 , H.C. Park 1* ,

Familial Membranous Nephropathy D. Bockenhauer R. Kleta Gene identification Positional

Pivotal Memory Technologies Enabling New Generation of AI Workloads Tien Shiah Memory Product

If each of these people said this, who would you trust? Blueberries are great they can

Brief Structure of Symposium Presentation Havent been able to do a draft because I am still

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack