The Virtual Write Queue: Coordinating DRAM and Last-Level Cache - PowerPoint PPT Presentation

ISCA 2010 The Virtual Write Queue: Coordinating DRAM and Last-Level Cache Policies Jeffrey Stuecheli 1,2 , Dimitris Kaseridis 1 , David Daly 3 , Hillery C. Hunter 3 & Lizy K. John 1 1 ECE Department, The University of Texas at Austin 2 IBM Corp., Austin 3 IBM Thomas J. Watson Research Center Laboratory for Computer Architecture 6/21/2010

Background Memory terminology � Target System: Multi-Core CMP – 8-16 cores (and up) – Shared cache and memory subsystem � Terminology: – Channel/Rank/Chip/Bank � Area of focus: Improving scheduling of memory interface in light of many cores combined with DRAM technology challenges 2 Laboratory for Computer Architecture 6/21/2010

Motivation Memory Wall (Labyrinth) � Traditional concern is read latency – Fixed at ~26 ns � Beyond latency, many parameters are limiters to efficient utilization � Data bus frequency � 2x each DDRx generation – DDR 200-400, DDR2 400-1066, DDR3 800-1600 – But, internal latency is ~constant � Fixed latency – Bank Precharge (50ns, ~7 operations@1066Mhz) – Write � Read (7.5ns, ~2 operations@1066MHz) 3 Laboratory for Computer Architecture 6/21/2010

Motivation Implications � Scheduling efficiency – Reads � Critical path to execution – Writes � Decoupled � Queuing – We need more write buffering (make the most of each opportunity to execute writes) – Not Read buffering due to latency criticality of loads

The Virtual Write Queue � Grow effective write Last Level reordering by an order of Cache Physical Virtual magnitude through a two- Write Write Queue level structure Queue – Writes can only execute out Cache DRAM Cache of physical write queue Cleaner Scheduler Set – Keep physical queue full with a good mix of operations Dirty – Physical write queue MRU LRU becomes staging ground, Way Way covers latency to pull data from the LLC.

VWQ Details

VWQ Details Cache � Memory Writeback Evolution � Forced Writeback: Traditional approach to writeback. � Eager Writeback: Decouple cache fill from writeback with early “eager” writeback of dirty data (Lee, MICRO 2000). � Scheduled Writeback: Our proposal. Place writeback under the control of the memory scheduler. 7 Laboratory for Computer Architecture 6/21/2010

VWQ Details Filling the Physical Write Queue � Key concept: – Relatively few classes of writes: • Rank Classification: Which Rank? • Page Mode: Quality level • Bank conflicts: Avoid writes to same bank, different page – Physical Write Queue Content: • Maintain high quality writes in structure • Keep Writes to each Rank

VWQ Details Address Mapping � Set address of cache contains – All Rank selection bits – All Bank selection bits – Some number of Column bits (address within a DRAM page)

VWQ Details The Cache Cleaner Last Level Cache Physical Virtual Write Write � Goal: fast/efficient search of Queue Queue large LLC directory Cache DRAM Cache � Based around Set State Vector Cleaner Scheduler Set (SSV) � SSV enables – Efficient communication of dirty Dirty lines to be cleaned MRU LRU Way Way � Cleaner will select line based on current physical write Q contents – Keep full with uniform mix of operations to each DRAM resource Set State Vector 10 Laboratory for Computer Architecture 6/21/2010

VWQ Details Read/Write Priority in scheduler � Goal: Defer write operations as long as possible – Forced Writeback : Queuing depth is quite limited. – Eager Writeback : Write queue is always full; how do we know when we must execute writes? – Virtual Write Queue : Monitor overall fullness on a per Rank basis. Much larger effective buffering capability.

Evaluation/Results

Evaluation/Results Bandwidth Improvement Example � From SPEC mcf workload Baseline VWQ 0.9 0.8 0.7 0.6 utilization 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200 250 300 350 millions of instructions

Evaluation/Results Virtual Write Queue IPC Gains � Each experiment consists of 8 copies of the same benchmark – IPC was observed to be uniform across cores (symmetrical system was fair) � Improvements in 1,2, and 4 rank systems – Largest improvement with 1 rank due to exposed “Write to Read Same Rank” penalty 25.00% IPC Improvement 1 Rank 20.00% 2 Ranks 15.00% 4 Ranks 10.00% 5.00% 0.00% hmmer quantum mcf omnetpp bzip2 bwaves cactus dealII gcc emsFDTD leslie3d AVG soplex 14 Laboratory for Computer Architecture 6/21/2010

Evaluation/Results Power Reduction Due to Increased Write Page Mode Access � Overall DRAM power reduction is shown 15 Laboratory for Computer Architecture 6/21/2010

Conclusion � Memory scheduling is critical to CMP design � We must leverage all state in the SOC/CMP

Thank You, Questions? Laboratory for Computer Architecture University of Texas Austin & IBM Austin & IBM T. J. Watson Lab 17 Laboratory for Computer Architecture 6/21/2010

The Virtual Write Queue: Coordinating DRAM and Last-Level Cache - PowerPoint PPT Presentation

ISCA 2010 The Virtual Write Queue: Coordinating DRAM and Last-Level Cache Policies Jeffrey Stuecheli 1,2 , Dimitris Kaseridis 1 , David Daly 3 , Hillery C. Hunter 3 & Lizy K. John 1 1 ECE Department, The University of Texas at Austin 2 IBM

ADT Queue 1 Queues 2 Queue of cars 3 Queue at logical level A queue is an ADT in which

Virtual Memory Lecture 25 CS301 DRAM as cache What about programs larger than DRAM?

ECE 2574: Data Structures and Algorithms - Queue ADT C. L. Wyatt Today we will look at the Queue

Priority Queue Queue Enqueue an item Dequeue: Item returned has been in the queue

Large Scale DRAM Model DRAM Engineers DRAM Engineers Team: Abdulrahman Alqahtani,

Priority Queues, Heaps, Graphs, and Sets Priority Queue Queue Enqueue an item

Queue 7 January 2019 OSU CSE 1 Queue The Queue component family allows you to manipulate

Queue Mode Scheduling at Subaru Telescope Eric Jeschke Software Division eric@naoj.org Queue

Back of queue detection Edward D. Cox, Indiana DOT 1 Back ck of queue, queue, m many option

COMP 590-154: Computer Architecture Memory / DRAM SRAM vs. DRAM SRAM = Static RAM As

THE THE BE BE NE NE FITS THAT SE FITS THAT SE NSORS CAN BRING TO DISASTE NSORS CAN BRING

ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018

CS261 Data Structures Dynamic Array Queue and Deque Queues int isEmpty(); void addBack(TYPE

Stack and Queue ADT Stack Queue 2 ADT Example All main programs rely on concept of

COMP 213 Advanced Object-oriented Programming Lecture 8 The Queue ADT (cont.) Recall: The Queue

Queues The Abstract Data Type Queue FIFO queue ADT Another common linear data structure

The Boom-Bust of Sub-prime Mortgage Market and Its Impacts: Whats Old and Whats New?

Lecture 23: Cache, Memory, Virtual Memory Todays topics: Cache examples, caching

De Deeply Virtual Co Compton Scattering at at 10. 10.6 6 GeV eV wit ith h CLA LAS12

CS 162 Intro to Programming II Polymorphism Ia 1

Enhancing General Practitioners Participation in a Virtual Community of Practice for Continuing

Experiences of the Virtual Experiences of the Virtual Community Grid Workgroup

Chatfields Community Informational Meeting Agenda The Vision for Chatfield Virtual Academy

Agile virtual teams: The undervalued key to organizational agility Dr. Orlando Ramirez Agile

Sambuz

Useful Links

Newsletter

Mail Us