Instruction Prefetcher Ali Ansari (Sharif) Fatemeh Golshan (Sharif) - - PowerPoint PPT Presentation

instruction prefetcher
SMART_READER_LITE
LIVE PREVIEW

Instruction Prefetcher Ali Ansari (Sharif) Fatemeh Golshan (Sharif) - - PowerPoint PPT Presentation

MANA: Microarchitecting an Instruction Prefetcher Ali Ansari (Sharif) Fatemeh Golshan (Sharif) Pejman Lotfi-Kamran (IPM) Hamid Sarbazi-Azad (Sharif, IPM) Instruction Cache Misses Server applications o Multi-megabyte instruction footprint o


slide-1
SLIDE 1

MANA: Microarchitecting an Instruction Prefetcher

Ali Ansari (Sharif) Fatemeh Golshan (Sharif) Pejman Lotfi-Kamran (IPM) Hamid Sarbazi-Azad (Sharif, IPM)

slide-2
SLIDE 2

2 / 18

Instruction Cache Misses

  • Server applications
  • Multi-megabyte instruction footprint
  • 25% increase in size per year [Kanev, ISCA’15]
  • Limited capacity L1 instruction cache
  • 512 blocks, 32 KB

Frequent L1i misses hurt performance!

slide-3
SLIDE 3

3 / 18

Prior Work

Significant storage cost or uncovered potential!

slide-4
SLIDE 4

4 / 18

Contributions

  • Storage cost is important
  • Unlimited storage results in high speedup
  • Prefetching records
  • A few distinct records
  • Low storage demand per record
  • MANA
  • 4 K distinct prefetching records, on average
  • Each record ≈ 4 bytes
  • 24% and 26.6% speedup with 16.3 and 122 KB

MANA offers considerable speedup with a limited storage!

slide-5
SLIDE 5

5 / 18

Outline

  • Introduction
  • Motivation
  • Our Proposal, MANA Prefetcher
  • Methodology
  • Evaluation
  • Conclusion
slide-6
SLIDE 6

6 / 18

Motivation

  • Spatial region
  • Trigger address + a footprint
  • Advantages
  • Covering a large address space
  • Few distinct prefetching records
  • Easily detectable
  • Simple design
  • Widely used in prior work
  • PIF [Ferdman, MICRO’11]
  • RDIP [Kolli, MICRO’13]
  • Shotgun [Kumar, ASPLOS’18]

Spatial region is a good prefetching record!

slide-7
SLIDE 7

7 / 18

Motivation (cont.)

  • Spatial region’s challenges:
  • Finding the successor, why?
  • Prefetching the trigger block
  • Timeliness
  • Storage cost
  • Trigger address = block address!
  • Prior work cannot solve these challenges effectively
  • MANA offers simple solutions for them

MANA microarchitects the use of spatial regions!

slide-8
SLIDE 8

8 / 18

MANA

  • Spatial region is the main prefetching record
  • No association with other events
  • MANA_Table
  • A set-associative table to hold spatial regions
  • Looked up by trigger addresses
  • Finding the successor
  • The sequence of spatial regions is repetitive (PIF)
  • Use a pointer to the successor spatial region
  • Chase the pointers to discover successor spatial regions

MANA: (Spatial region + a pointer) in a set-associative table!

slide-9
SLIDE 9

9 / 18

MANA: High-Order Bit Patterns

Block Offset Set Number Instruction Address Tag

slide-10
SLIDE 10

10 / 18

MANA: High-Order Bit Patterns

Block Offset Set Number HOBP Instruction Address Partial Tag

slide-11
SLIDE 11

11 / 18

MANA: High-Order Bit Patterns

Block Offset Set Number HOBP Instruction Address Partial Tag b’01 Partial Tag HOBP index 100 HOBPs’ Table 0xffa358f12b 100

slide-12
SLIDE 12

12 / 18

MANA: Recording

slide-13
SLIDE 13

13 / 18

MANA: Replaying

slide-14
SLIDE 14

14 / 18

Methodology

  • ChampSim Simulator
  • Default parameters
  • 32 KB, 8-way, L1 instruction cache
  • 50 public traces
  • Warmup: 50 M instructions
  • Evaluation: 50 M instructions
  • Competitors: RDIP, Shotgun, and PIF
slide-15
SLIDE 15

15 / 18

Evaluation

1.00 1.05 1.10 1.15 1.20 1.25 1.30 8 KB 16 KB 128 KB 8 KB 16 KB 128 KB 8 KB 16 KB 128 KB 8 KB 16 KB 128 KB RDIP Shotgun PIF MANA Speedup

Better performance in all given storage budgets!

slide-16
SLIDE 16

16 / 18

Evaluation (cont.)

MANA can effectively prefetch for small cache sizes!

1.0 1.2 1.4 1.6 1.8

client 2 client 7 server 1 server 9 server 12 server 16 server 29 server 36 spec gcc-3 spec x264-1 Avrg. 10 Avrg. All

Speedup

8 KB 16 KB 32 KB

slide-17
SLIDE 17

17 / 18

Conclusion

  • MANA uses spatial regions
  • Spatial regions are chained with pointers to each other
  • HOBP is used to reduce the storage cost
  • 24% speedup with only 16.3 KB
  • Significant gap with prior work
  • More practical design
  • 26.6% speedup with 122 KB
slide-18
SLIDE 18

Thank You!

Any Questions?