Prefetching in Hybrid Main Memory Systems Subisha V , Varun Gohil - - PowerPoint PPT Presentation

prefetching in hybrid main memory systems
SMART_READER_LITE
LIVE PREVIEW

Prefetching in Hybrid Main Memory Systems Subisha V , Varun Gohil - - PowerPoint PPT Presentation

Prefetching in Hybrid Main Memory Systems Subisha V , Varun Gohil , Nisarg Ujjainkar , Manu Awasthi * IIT Gandhinagar * Ashoka University HotStorage 2020 1 2 Outline of the Presentation Background Insights


slide-1
SLIDE 1

Prefetching in Hybrid Main Memory Systems

Subisha V⤒, Varun Gohil⤒, Nisarg Ujjainkar⤒, Manu Awasthi*

⤒IIT Gandhinagar *Ashoka University

HotStorage 2020

1

slide-2
SLIDE 2

Outline of the Presentation

  • Background
  • Insights
  • Prefetcher Design
  • Evaluation
  • Future Work

2

slide-3
SLIDE 3

Outline of the Presentation

  • Background
  • Insights
  • Prefetcher Design
  • Evaluation
  • Future Work

3

slide-4
SLIDE 4

DRAM Scaling Challenge

Solving the DRAM Scaling Challenge, Samira Khan, ARM Research Summit 2018

2X/1.5 Years 2X/3 Years

DRAM Density Scaling slowing down

4

slide-5
SLIDE 5

DRAM Scaling Challenge

Solving the DRAM Scaling Challenge, Samira Khan, ARM Research Summit 2018

2X/1.5 Years 2X/3 Years Genomics Neural Nets Virtual Reality In-Memory Frameworks

DRAM Density Scaling slowing down Workloads require higher memory capacity

5

slide-6
SLIDE 6

Emerging Memory Technologies

and many more ...

6

slide-7
SLIDE 7

Emerging Memory Technologies

Better density Energy efficient

7

slide-8
SLIDE 8

Emerging Memory Technologies

Better density Energy efficient Longer access latencies Finite write endurance

8

slide-9
SLIDE 9

Hybrid Main Memory

Use DRAM and NVM synergistically

9

slide-10
SLIDE 10

Hybrid Main Memory

Use DRAM and NVM synergistically

Single Address Space Variant

10

slide-11
SLIDE 11

Hybrid Main Memory

Use DRAM and NVM synergistically

DRAM as a Cache Variant

11

slide-12
SLIDE 12

Alloy Cache

  • State of the art DRAM Cache design

12

slide-13
SLIDE 13

Alloy Cache

  • State of the art DRAM Cache design
  • Acts as a direct mapped cache to NVM
  • Fetches data at cacheline granularity

13

slide-14
SLIDE 14

Alloy Cache

  • State of the art DRAM Cache design
  • Acts as a direct mapped cache to NVM
  • Fetches data at cacheline granularity
  • Cacheline size is 72B

14

slide-15
SLIDE 15

Alloy Cache Page

  • 4KB contiguous memory chunk

15

slide-16
SLIDE 16

Alloy Cache Page

  • 4KB contiguous memory chunk

16

slide-17
SLIDE 17

Alloy Cache Page

  • 4KB contiguous memory chunk

17

Empty Cachelines

slide-18
SLIDE 18

Outline of the Presentation

  • Background
  • Insights
  • Prefetcher Design
  • Evaluation
  • Future Work

18

slide-19
SLIDE 19

Insights

1 GB Alloy Cache, 64 GB PCM PARSEC

19

slide-20
SLIDE 20

Insights

20

slide-21
SLIDE 21

Insights

21

slide-22
SLIDE 22

Insights

Workloads exhibit page-level spatial locality in NVM

22

slide-23
SLIDE 23

Insights

23

slide-24
SLIDE 24

Insights

92% of DRAM Cache pages are completely empty !

24

Unallocated/Empty Page Allocated Page

slide-25
SLIDE 25

Insights

A large portion of DRAM Cache is unallocated

25

slide-26
SLIDE 26

Outline of the Presentation

  • Background
  • Insights
  • Prefetcher Design
  • Evaluation
  • Future Work

26

slide-27
SLIDE 27

Prefetcher Design

  • Page-Level Spatial Locality in NVM

⇒ Prefetch at page granularity

27

slide-28
SLIDE 28

Prefetcher Design

  • Page-Level Spatial Locality in NVM

⇒ Prefetch at page granularity

  • DRAM Cache is largely unallocated

⇒ Place prefetched pages in DRAM Cache

28

slide-29
SLIDE 29

Prefetcher Design

29

slide-30
SLIDE 30

Prefetcher Design

  • When to prefetch?

30

slide-31
SLIDE 31

Prefetcher Design

  • When to prefetch?
  • Where to place prefetched data in DRAM Cache?

31

slide-32
SLIDE 32

Prefetcher Design

  • When to prefetch?
  • Where to place prefetched data in DRAM Cache?
  • How to identify type of data at DRAM Cache location?

32

slide-33
SLIDE 33

Prefetcher Design

  • When to prefetch?
  • Where to place prefetched data in DRAM Cache?
  • How to identify type of data at DRAM Cache location?
  • How to check if data is in a prefetched page?

33

slide-34
SLIDE 34

When to Prefetch?

34

slide-35
SLIDE 35

When to Prefetch?

Prefetch a page if ⇒ #cacheline access ≥ Access Threshold (AT) ⇒ #unique cacheline access ≥ Unique Access Threshold (UAT)

35

slide-36
SLIDE 36

When to Prefetch?

NVM Page Classifier (NPC) ⇒ Stores cacheline access history of recently used pages

36

slide-37
SLIDE 37

NVM Page Classifier Entry

N : Max number of pages that can be present in NVM AT: Access Threshold

37

slide-38
SLIDE 38

NVM Page Classifier Entry

N : Max number of pages that can be present in NVM AT: Access Threshold

38

slide-39
SLIDE 39

NVM Page Classifier Entry

N : Max number of pages that can be present in NVM AT: Access Threshold

39

slide-40
SLIDE 40

NVM Page Classifier Entry

N : Max number of pages that can be present in NVM AT: Access Threshold

40

slide-41
SLIDE 41

NVM Page Classifier Entry

N : Max number of pages that can be present in NVM AT: Access Threshold

41

slide-42
SLIDE 42

NVM Page Classifier Entry

N : Max number of pages that can be present in NVM AT: Access Threshold

42

slide-43
SLIDE 43

Where to place Prefetched Page?

43

slide-44
SLIDE 44

Where to place Prefetched Page?

Last Unallocated DRAM Cache page

44

slide-45
SLIDE 45

Where to place Prefetched Page?

Empty Page Classifier (EPC) ⇒ Stores the location of unallocated DRAM Cache pages

45

slide-46
SLIDE 46

Empty Page Classifier (EPC)

46

slide-47
SLIDE 47

Empty Page Classifier (EPC)

47

slide-48
SLIDE 48

Empty Page Classifier (EPC)

48

slide-49
SLIDE 49

Empty Page Classifier (EPC)

49

slide-50
SLIDE 50

Empty Page Classifier (EPC)

50

slide-51
SLIDE 51

Empty Page Classifier (EPC)

51

slide-52
SLIDE 52

Empty Page Classifier (EPC)

Page Number = (4096 ✕ Level 1 index) + (64 ✕ Level 2 index) + Level 3 index

52

slide-53
SLIDE 53

Identifying type of data in DRAM Cache

53

slide-54
SLIDE 54

Identifying type of data in DRAM Cache

A DRAM Cache location might be ⇒ Prefetched page

54

slide-55
SLIDE 55

Identifying type of data in DRAM Cache

A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page

55

slide-56
SLIDE 56

Identifying type of data in DRAM Cache

A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page ⇒ Empty

56

slide-57
SLIDE 57

Identifying type of data in DRAM Cache

A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page ⇒ Empty Need to distinguish them to ensure correctness

57

slide-58
SLIDE 58

Identifying type of data in DRAM Cache

State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page

58

slide-59
SLIDE 59

Identifying type of data in DRAM Cache

59

State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page

slide-60
SLIDE 60

Identifying type of data in DRAM Cache

60

State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page

slide-61
SLIDE 61

Identifying type of data in DRAM Cache

61

State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page

slide-62
SLIDE 62

Identifying type of data in DRAM Cache

62

State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page

slide-63
SLIDE 63

Identifying type of data in DRAM Cache

Type Classifier (TC) ⇒ Stores the state of the DRAM Cache location

63

slide-64
SLIDE 64

Type Classifier Entry

64

slide-65
SLIDE 65

Type Classifier Entry

65

slide-66
SLIDE 66

Type Classifier Entry

66

slide-67
SLIDE 67

Checking if data is in a prefetched page

67

slide-68
SLIDE 68

Checking if data is in a prefetched page

Page Redirection Table (PRT) ⇒ Hash Table storing tags of prefetched data

68

slide-69
SLIDE 69

Page Redirection Table Entry

D : Max number of pages that can be present in DRAM Cache

69

slide-70
SLIDE 70

Page Redirection Table Entry

D : Max number of pages that can be present in DRAM Cache

70

slide-71
SLIDE 71

Page Redirection Table Entry

D : Max number of pages that can be present in DRAM Cache

71

slide-72
SLIDE 72

Page Redirection Table Entry

D : Max number of pages that can be present in DRAM Cache

72

slide-73
SLIDE 73

Outline of the Presentation

  • Background
  • Insights
  • Prefetcher Design
  • Evaluation
  • Future Work

73

slide-74
SLIDE 74

Evaluation

ZSim + NVMain ⇒ 1 GB Alloy Cache, 64 GB Phase Change Memory ⇒ 8 core, 2.6 GHz processor ⇒ Use CACTI for access latency of structures ⇒ PARSEC benchmark

74

slide-75
SLIDE 75

Evaluation

75

slide-76
SLIDE 76

Evaluation

Sequential access behavior

76

slide-77
SLIDE 77

Evaluation

1.5✕-4✕ improvement

77

slide-78
SLIDE 78

Evaluation

78

slide-79
SLIDE 79

Evaluation

7✕ speedup

79

slide-80
SLIDE 80

Evaluation

16-40% higher IPC

80

slide-81
SLIDE 81

Outline of the Presentation

  • Background
  • Insights
  • Prefetcher Design
  • Evaluation
  • Future Work

81

slide-82
SLIDE 82

Future Work

Evaluate our prefetcher on ⇒ Memory-intensive SPEC workloads ⇒ Graph workloads having irregular memory access patterns ⇒ Compare with similar recent works

82

slide-83
SLIDE 83

Key Takeaways

  • Prefetch at page granularity

to exploit page-level spatial locality.

  • Place prefetched page in

DRAM Cache to improve its utilization

  • We observe 16-40% increase

in IPC on PARSEC.

83

gohil.varun@iitgn.ac.in Contact Us: manu.awasthi@ashoka.edu.in Link to Paper: