Prefetching in Hybrid Main Memory Systems Subisha V , Varun Gohil - PowerPoint PPT Presentation

Prefetching in Hybrid Main Memory Systems Subisha V ⤒ , Varun Gohil ⤒ , Nisarg Ujjainkar ⤒ , Manu Awasthi * ⤒ IIT Gandhinagar * Ashoka University HotStorage 2020 1

2 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work

4 DRAM Scaling Challenge 2X/1.5 Years 2X/3 Years DRAM Density Scaling slowing down Solving the DRAM Scaling Challenge, Samira Khan, ARM Research Summit 2018

5 DRAM Scaling Challenge 2X/1.5 Years 2X/3 Years Neural Nets Genomics In-Memory Virtual Reality Frameworks DRAM Density Scaling slowing down Workloads require higher memory capacity Solving the DRAM Scaling Challenge, Samira Khan, ARM Research Summit 2018

6 Emerging Memory Technologies and many more ...

7 Emerging Memory Technologies Better density Energy efficient

8 Emerging Memory Technologies Better density Energy efficient Longer access latencies Finite write endurance

9 Hybrid Main Memory Use DRAM and NVM synergistically

10 Hybrid Main Memory Use DRAM and NVM synergistically Single Address Space Variant

11 Hybrid Main Memory Use DRAM and NVM synergistically DRAM as a Cache Variant

12 Alloy Cache ● State of the art DRAM Cache design

13 Alloy Cache ● State of the art DRAM Cache design ● Acts as a direct mapped cache to NVM ● Fetches data at cacheline granularity

14 Alloy Cache ● State of the art DRAM Cache design ● Acts as a direct mapped cache to NVM ● Fetches data at cacheline granularity ● Cacheline size is 72B

15 Alloy Cache Page ● 4KB contiguous memory chunk

16 Alloy Cache Page ● 4KB contiguous memory chunk

17 Alloy Cache Page ● 4KB contiguous memory chunk Empty Cachelines

19 Insights 1 GB Alloy Cache, 64 GB PCM PARSEC

20 Insights

21 Insights

22 Insights Workloads exhibit page-level spatial locality in NVM

23 Insights

24 Insights 92% of DRAM Cache pages are completely empty ! Unallocated/Empty Page Allocated Page

25 Insights A large portion of DRAM Cache is unallocated

27 Prefetcher Design ● Page-Level Spatial Locality in NVM ⇒ Prefetch at page granularity

28 Prefetcher Design ● Page-Level Spatial Locality in NVM ⇒ Prefetch at page granularity ● DRAM Cache is largely unallocated ⇒ Place prefetched pages in DRAM Cache

29 Prefetcher Design

30 Prefetcher Design ● When to prefetch?

31 Prefetcher Design ● When to prefetch? ● Where to place prefetched data in DRAM Cache?

32 Prefetcher Design ● When to prefetch? ● Where to place prefetched data in DRAM Cache? ● How to identify type of data at DRAM Cache location?

33 Prefetcher Design ● When to prefetch? ● Where to place prefetched data in DRAM Cache? ● How to identify type of data at DRAM Cache location? ● How to check if data is in a prefetched page?

34 When to Prefetch?

35 When to Prefetch? Prefetch a page if ⇒ #cacheline access ≥ Access Threshold (AT) ⇒ #unique cacheline access ≥ Unique Access Threshold (UAT)

36 When to Prefetch? NVM Page Classifier (NPC) ⇒ Stores cacheline access history of recently used pages

37 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold

43 Where to place Prefetched Page?

44 Where to place Prefetched Page? Last Unallocated DRAM Cache page

45 Where to place Prefetched Page? Empty Page Classifier (EPC) ⇒ Stores the location of unallocated DRAM Cache pages

46 Empty Page Classifier (EPC)

52 Empty Page Classifier (EPC) Page Number = (4096 ✕ Level 1 index) + (64 ✕ Level 2 index) + Level 3 index

53 Identifying type of data in DRAM Cache

54 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page

55 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page

56 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page ⇒ Empty

57 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page ⇒ Empty Need to distinguish them to ensure correctness

58 Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page

Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page 59

63 Identifying type of data in DRAM Cache Type Classifier (TC) ⇒ Stores the state of the DRAM Cache location

64 Type Classifier Entry

67 Checking if data is in a prefetched page

68 Checking if data is in a prefetched page Page Redirection Table (PRT) ⇒ Hash Table storing tags of prefetched data

69 Page Redirection Table Entry D : Max number of pages that can be present in DRAM Cache

74 Evaluation ZSim + NVMain ⇒ 1 GB Alloy Cache, 64 GB Phase Change Memory ⇒ 8 core, 2.6 GHz processor ⇒ Use CACTI for access latency of structures ⇒ PARSEC benchmark

75 Evaluation

76 Evaluation Sequential access behavior

77 Evaluation 1.5 ✕ -4 ✕ improvement

78 Evaluation

79 Evaluation 7 ✕ speedup

80 Evaluation 16-40% higher IPC

82 Future Work Evaluate our prefetcher on ⇒ Memory-intensive SPEC workloads ⇒ Graph workloads having irregular memory access patterns ⇒ Compare with similar recent works

83 Key Takeaways Link to Paper: ● Prefetch at page granularity to exploit page-level spatial locality. Contact Us: ● Place prefetched page in gohil.varun@iitgn.ac.in DRAM Cache to improve its manu.awasthi@ashoka.edu.in utilization ● We observe 16-40% increase in IPC on PARSEC.

Prefetching in Hybrid Main Memory Systems Subisha V , Varun Gohil - PowerPoint PPT Presentation

Prefetching in Hybrid Main Memory Systems Subisha V , Varun Gohil , Nisarg Ujjainkar , Manu Awasthi * IIT Gandhinagar * Ashoka University HotStorage 2020 1 2 Outline of the Presentation Background Insights

1 Prefetching Implementations Recall Stream Buffer Diagram Sequential and stride prefetching

Prefetching Hyperlinks Prefetching Methods Prefetching Uncacheable/Dynamic Data

Collective Prefetching for Parallel I/O Systems Yong Chen and Philip C. Roth Oak Ridge National

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

COMP 590-154: Computer Architecture Prefetching Prefetching (1/3) Fetch block ahead of demand

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

Memory Prefetching Nima Honarmand Spring 2016 :: CSE 502 Computer Architecture The memory

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Linux solution for prefetching necessary data during application and system startup Krzysztof

Effectively Prefetching Remote Memory with Leap Hasan Al Maruf and Mosharaf Chowdhury 1

Memory Prefetching Instructor: Nima Honarmand Spring 2015 :: CSE 502 Computer Architecture

Memory Questions? ! What is main memory? CSCI [4|6]730 ! How does multiple processes share memory

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

In the news Data Leaks Mar 19 Apr 19 Mar 18 shutdown after data leaks exposed user

Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019

Hybrid Indexes Huanchen Zhang You are running out of memory 2 You are running out of memory 2

GLUSTER The storage for your Hybrid Cloud Amar Tumballi, Manager, Storage Engineering

Modeling and Analysis of Hybrid Systems Erika brahm RWTH Aachen University, Germany Beijing,

Hybrid Type Systems Jose A. Lopes Max Planck Institute for Software Systems (MPI-SWS) MOVEP 2012

Partial Substitute tong-wang@uiowa.edu Poster #67 A black-box model + High predictive

Hybrid Sequence Encoder Of Collaborative Experts For Video Retrieval Kaixu Cui, Hui Liu, Cheng