Storage Efficient Hardware Prefetching using Delta Correlating - - PowerPoint PPT Presentation

storage efficient hardware prefetching using delta
SMART_READER_LITE
LIVE PREVIEW

Storage Efficient Hardware Prefetching using Delta Correlating - - PowerPoint PPT Presentation

Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables Marius Granns Magnus Jahre Lasse Natvig Feb 14th 2008 www.ntnu.no M. Granns et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction


slide-1
SLIDE 1

Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables

Marius Grannæs Magnus Jahre Lasse Natvig Feb 14th 2008

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-2
SLIDE 2

2

Delta Correlating Prefetch Tables

— Perez et al. did a comparative survey of hardware prefetchers in 2004.

  • Reference Prediction Tables and PC/DC using a Global History

Buffer

— Delta Correlating Prediction Tables combines these two approaches and adds extra control for avoiding duplicate prefetching. — Perez et al. also found that you can make anything look good provided the right benchmarks and parameters.

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-3
SLIDE 3

3

Prefetcher Overview

Sequential RPT PC/DC DCPT

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-4
SLIDE 4

3

Prefetcher Overview

Sequential RPT PC/DC DCPT Sequential

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-5
SLIDE 5

3

Prefetcher Overview

Sequential RPT PC/DC DCPT Sequential Constant Stride

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-6
SLIDE 6

3

Prefetcher Overview

Sequential RPT PC/DC DCPT Sequential Constant Stride Repeating Pattern

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-7
SLIDE 7

3

Prefetcher Overview

Sequential RPT PC/DC DCPT Sequential Constant Stride Repeating Pattern Complexity

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-8
SLIDE 8

3

Prefetcher Overview

Sequential RPT PC/DC DCPT Sequential Constant Stride Repeating Pattern Complexity Delay

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-9
SLIDE 9

3

Prefetcher Overview

Sequential RPT PC/DC DCPT Sequential Constant Stride Repeating Pattern Complexity Delay Storage Efficiency

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-10
SLIDE 10

4

Outline

Motivation Reference Prediction Tables Properties of RPT prefetching PC/DC Prefetching Global History Buffer Delta Correlation Properties of PC/DC prefetching Delta Correlating Prefetch Tables DCPT Properties Results Concluding Remarks

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-11
SLIDE 11

5

Reference Prediction Tables

PC Last Addr. State Delta Cache Miss: Initial Training Prefetch

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-12
SLIDE 12

6

Reference Prediction Tables

PC Last Addr. State Delta Cache Miss: Initial Training Prefetch

1

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-13
SLIDE 13

7

Reference Prediction Tables

PC Last Addr. State Delta Cache Miss: Initial Training Prefetch

1 1 100

Init

  • www.ntnu.no
  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-14
SLIDE 14

8

Reference Prediction Tables

PC Last Addr. State Delta Cache Miss: Initial Training Prefetch

1

1

100

Train

  • 3

3 2

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-15
SLIDE 15

9

Reference Prediction Tables

PC Last Addr. State Delta Cache Miss: Initial Training Prefetch

1

3

100

Prefetch

2

3 5 2 5

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-16
SLIDE 16

10

Properties of RPT prefetching

— Very high accuracy — Relatively low cost - Table lookup, comparator and subtraction — Small memory footprint — Only able to capture constant strides

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-17
SLIDE 17

11

Global History Buffer

PC Ptr Address Ptr 100 Index Table Global History Buffer 1 3 Delta Buffer

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-18
SLIDE 18

12

Global History Buffer

PC Ptr Address Ptr 100 Index Table Global History Buffer 1 3 Delta Buffer 5

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-19
SLIDE 19

13

Global History Buffer

PC Ptr Address Ptr 100 Index Table Global History Buffer 1 3 Delta Buffer 5

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-20
SLIDE 20

14

Global History Buffer

PC Ptr Address Ptr 100 Index Table Global History Buffer 1 3 Delta Buffer 5

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-21
SLIDE 21

15

Global History Buffer

PC Ptr Address Ptr 100 Index Table Global History Buffer 1 3 Delta Buffer 5

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-22
SLIDE 22

16

Global History Buffer

PC Ptr Address Ptr 100 Index Table Global History Buffer 1 3 Delta Buffer 5

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-23
SLIDE 23

17

Global History Buffer

PC Ptr Address Ptr 100 Index Table Global History Buffer 1 3 Delta Buffer 5 2

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-24
SLIDE 24

18

Global History Buffer

PC Ptr Address Ptr 100 Index Table Global History Buffer 1 3 Delta Buffer 5 2 2

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-25
SLIDE 25

19

Delta Correlation

1 2 3 1 2 3 1 24 25 2 10 11 13 16 17 19 22

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-26
SLIDE 26

20

Delta Correlation

1 2 3 1 2 3 1 24 25 2 10 11 13 16 17 19 22

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-27
SLIDE 27

21

Delta Correlation

10 11 13 17 18 20 23 1 2 3 1 2 3 1 24 25 2

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-28
SLIDE 28

22

Delta Correlation

10 11 13 17 18 20 23 1 2 3 1 2 3 1 24 25 2

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-29
SLIDE 29

23

Delta Correlation

10 11 13 17 18 20 23 1 2 3 1 2 3 1 24 25 2

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-30
SLIDE 30

24

Delta Correlation

10 11 13 17 18 20 23 1 2 3 1 2 3 1 24 25 2

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-31
SLIDE 31

25

Delta Correlation

1 2 3 1 2 3 1 24 25 2 10 11 13 16 17 19 22

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-32
SLIDE 32

26

Delta Correlation

1 2 3 1 2 3 1 23 25 2 10 11 13 16 17 19 22

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-33
SLIDE 33

27

Delta Correlation

1 2 3 1 2 3 1 23 25 2 10 11 13 16 17 19 22

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-34
SLIDE 34

28

Properties of PC/DC prefetching

— Can capture a very wide range of patterns — High accuracy and performance — The global history must be very large to capture relevant data — Pointer chasing — The deltas are recalucated every time — The number of deltas can vary

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-35
SLIDE 35

29

Delta Correlating Prefetch Tables

PC Last Addr. D Last Pref. D D D D D Ptr

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-36
SLIDE 36

30

Delta Correlating Prefetch Tables

PC Last Addr. D Last Pref. D D D D D Ptr 10 100 10

  • www.ntnu.no
  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-37
SLIDE 37

31

Delta Correlating Prefetch Tables

PC Last Addr. D Last Pref. D D D D D Ptr 10 100 10

  • 11

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-38
SLIDE 38

32

Delta Correlating Prefetch Tables

PC Last Addr. D Last Pref. D D D D D Ptr 10 100 10

  • 1
  • 11

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-39
SLIDE 39

33

Delta Correlating Prefetch Tables

PC Last Addr. D Last Pref. D D D D D Ptr 10 100 10

  • 1
  • 11

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-40
SLIDE 40

34

Delta Correlating Prefetch Tables

PC Last Addr. D Last Pref. D D D D D Ptr 10 100 11

  • 1
  • 11

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-41
SLIDE 41

35

Delta Correlating Prefetch Tables

PC Last Addr. D Last Pref. D D D D D Ptr 10 100 13

  • 1

2

  • 11

13

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-42
SLIDE 42

36

Delta Correlating Prefetch Tables

PC Last Addr. D Last Pref. D D D D D Ptr 10 100 16 3

  • 1

2

  • 11

13 16

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-43
SLIDE 43

37

Delta Correlating Prefetch Tables

PC Last Addr. D Last Pref. D D D D D Ptr 10 100 22 3 3 1 2

  • 1

2

  • 11

13 16 17 19 22

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-44
SLIDE 44

38

DCPT Properties

— Able to capture the same patterns as PC/DC — Only stores deltas

  • Uses less memory to store the same data
  • No need to recalcuate the deltas
  • Fixed number of deltas - Fixed timeliness

— Constant delay — Tracks issued prefetches to avoid overlap

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-45
SLIDE 45

39

Number of bits used to represent a delta

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 30 35 1 Coverage Speedup Bits Coverage

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-46
SLIDE 46

40

Number of bits used to represent a delta

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 30 35 1.2 1.25 1.3 1.35 1.4 Coverage Speedup Bits Coverage Speedup

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-47
SLIDE 47

41

Deltas per table entry

1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 5 10 15 20 25 30 35 Speedup Deltas per Entry

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-48
SLIDE 48

42

Number of table entries

1.26 1.28 1.3 1.32 1.34 1.36 1.38 10 20 30 40 50 60 70 80 90 100 1000 Speedup Number of Entries

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-49
SLIDE 49

43

Results I

1 1.5 2 2.5 3 3.5 4 4.5 5 libquantum milc leslie3d GemsFDTD lbm sphinx3 Speedup Benchmark Sequential RPT PC/DC DCPT www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-50
SLIDE 50

44

Results II

0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 bzip2 mcf hmmer h264ref

  • mnetpp

astar xalancbmk bwaves zeusmp gromacs cactusADM soplex calculix wrf G-mean Speedup Benchmark Sequential RPT PC/DC DCPT www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-51
SLIDE 51

45

Concluding Remarks

— Delta Correlating Prediction Tables is a hybrid of RPT and PC/DC prefetching. — Combines the table based design of RPT and the pattern matching techniques of PC/DC. — Compact representation — Calculation in constant time

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
slide-52
SLIDE 52

46

Thank you for listening

Are there any questions?

www.ntnu.no

  • M. Grannæs et.al., Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables