SEESAW: Se t E nhanced S uperpage Aw are caching Mayank Parasar , - - PowerPoint PPT Presentation

seesaw se t e nhanced s uperpage aw are caching
SMART_READER_LITE
LIVE PREVIEW

SEESAW: Se t E nhanced S uperpage Aw are caching Mayank Parasar , - - PowerPoint PPT Presentation

SEESAW: Se t E nhanced S uperpage Aw are caching Mayank Parasar , Abhishek Bhattacharjee , Tushar Krishna http://synergy.ece.gatech.edu/ School of Electrical and Computer Engineering Associativity Georgia Institute of Technology


slide-1
SLIDE 1

SEESAW: Set Enhanced Superpage Aware caching

Mayank Parasar∑, Abhishek BhattacharjeeΩ, Tushar Krishna∑

∑School of Electrical and Computer Engineering

Georgia Institute of Technology

ΩDepartment of Computer Science

Rutgers University

mparasar3@gatech.edu

Set Associativity

http://synergy.ece.gatech.edu/

slide-2
SLIDE 2

Outline

¡Motivation ¡SEESAW: Concept ¡SEESAW: Micro-architecture ¡Evaluation Methodology ¡Results ¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

2

6/26/18

slide-3
SLIDE 3

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

3

6/26/18

L1 Cache Characteristics

Fast lookup High hit-rate Energy Efficiency

slide-4
SLIDE 4

Virtually Indexed Physically Tagged [VIPT] Cache

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

4

6/26/18

TLB

PPN

Page Offset

tag

Data block v

VPN

Page Offset Set index

block

  • ffset

Cache HIT/MISS

VA PA

Way-1 Way-2 Way-3 Way-4 Way-1 Way-2 Way-3 Way-4 set-1 set-N Way-1 Way-2 Way-3 Way-4

Way-1

Way-2 Way-3

Way-4

=

slide-5
SLIDE 5

Virtually Indexed Physically Tagged [VIPT] Cache

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

5

6/26/18

TLB

PPN

Page Offset

tag

Data block v

VPN

Page Offset Set index

block

  • ffset

Cache HIT/MISS

VA PA

Way-1 Way-2 Way-3 Way-4 Way-1 Way-2 Way-3 Way-4 set-1 set-N Way-1 Way-2 Way-3 Way-4

Way-1

Way-2 Way-3

Way-4

=

VIPT Caches necessitate: (set-index + block-offset) <= Page-offset

slide-6
SLIDE 6

Impact of Associativity on Access Latency and Energy of cache

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

6

6/26/18

Cache Access Latency Cache Access Energy

slide-7
SLIDE 7

Effect of associativity on MPKI of cache

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

7

6/26/18

High Associativity hurts latency and energy without commensurately improving hit rate

slide-8
SLIDE 8

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

8

6/26/18

Revisiting L1 Cache Characteristics for VIPT Cache

Fast lookup High hit-rate Energy Efficiency

Virtual memory! Virtual memory!

slide-9
SLIDE 9

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

9

6/26/18

Opportunity: Superpage

Is it possible to relax constrains of Traditional VIPT cache?

Yes

How ?

4-KB 2-MB 1-GB

More page-offset bits for superpage!

HW and OS Support for Superpages in modern processors

Baseline Page Super Page Offset-bits: 12 Offset-bits: 21 Offset-bits: 30

slide-10
SLIDE 10

Prevalence of superpages in modern OSes under memory fragmentation

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

10

6/26/18

Ran on 32-core; Sandybridge; 32 GB RAM Memhog causes memory fragmentation; higher %age indicates higher fragmentation

slide-11
SLIDE 11

Outline

¡Motivation ¡SEESAW: Concept ¡SEESAW: Micro-architecture ¡Evaluation Methodology ¡Results ¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

11

6/26/18

slide-12
SLIDE 12

SEESAW: Concept

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

12

6/26/18

Less-sets More-associativity More-sets Less-associativity super-page Base-page

tag Data block v

Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-2 Way-2 Way-2 Way-2 Way-2 Way-2 Way-3 Way-3 Way-3 Way-3 Way-3 Way-3

Set:1 Set:2 Set:3

tag Data block v

Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1 Way-1

Set:1 Set:3 Set:2 Set:4 Set:5 Set:6 Set:7 Set:8 Set:9

Faster Energy-Efficient

slide-13
SLIDE 13

Outline

¡Motivation ¡SEESAW: Concept ¡SEESAW: Micro-architecture ¡Evaluation Methodology ¡Results ¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

13

6/26/18

slide-14
SLIDE 14

SEESAW: Micro-architecture

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

14

6/26/18

VPN

Set index block

  • ffset

Cache

VA

TLB

PPN

Basepage Offset

PA

set-N set-1

tag

Data block v

Way-3 Way-4 Way-3 Way-4 Way-3 Way-4 Way-3 Way-4

Basepage Offset

tag Data block

v

Way-1 Way-2 Way-1 Way-2 Way-1 Way-2 Way-1 Way-2 set-1 set-N

Partition bit

Translation Filter Table (TFT)

Partition decoder

Predicts whether page is superpage Partition-0 Partition-1

Superpage offset Decodes partition index from partition bit

slide-15
SLIDE 15

SEESAW: Micro-architecture

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

15

6/26/18

VPN

Set index block

  • ffset

VA

TLB

PPN

Basepage Offset

PA

set-N set-1

tag

Data block v

Way-3 Way-4 Way-3 Way-4 Way-3 Way-4 Way-3 Way-4

Basepage Offset

tag Data block

v

Way-1 Way-2 Way-1 Way-2 Way-1 Way-2 Way-1 Way-2 set-1 set-N

Partition bit

Translation Filter Table (TFT)

Partition decoder

Partition-0 Partition-1

Cache Superpage offset

slide-16
SLIDE 16

SEESAW: Superpage access

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

16

6/26/18

VPN

Set index block

  • ffset

VA

TLB

PPN

Basepage Offset

PA

set-N set-1

tag

Data block v

Way-3 Way-4 Way-3 Way-4 Way-3 Way-4 Way-3 Way-4

Basepage Offset

tag Data block

v

Way-1 Way-2 Way-1 Way-2 Way-1 Way-2 Way-1 Way-2 set-1 set-N

Partition bit

Translation Filter Table (TFT)

Partition decoder

Partition-0 Partition-1

Cache Super Page Superpage offset

=

HIT/MISS

slide-17
SLIDE 17

SEESAW: Basepage access

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

17

6/26/18

VPN

Set index block

  • ffset

VA

TLB

PPN

Basepage Offset

PA

set-N set-1

tag

Data block v

Way-3 Way-4 Way-3 Way-4 Way-3 Way-4 Way-3 Way-4

Basepage Offset

tag Data block

v

Way-1 Way-2 Way-1 Way-2 Way-1 Way-2 Way-1 Way-2 set-1 set-N

Partition index

Translation Filter Table (TFT)

Partition decoder

Partition-0 Partition-1

Cache Not a Super Page

=

HIT/MISS

slide-18
SLIDE 18

SEESAW: TFT and Partition Decoder

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

18

6/26/18

Super page? Tag: VA[63:21]

Partition decoder Translation Filter Table (TFT)

Translation Filter Table Ø TFT Lookup Ø Direct mapped Ø False negative due to size Ø TFT Update Ø VA misprediction Ø 2MB L1-TLB fill Ø 2MB L1-TLB Invalidation

Partition Decoder Ø For 32kB Cache Ø For 64kB Cache

slide-19
SLIDE 19

SEESAW: Cache line insertion policy

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

19

6/26/18

VPN

Set index block

  • ffset

VA

TLB

PPN

Baseline Page Offset

PA

set-N set-1

tag

Data block v

Way-3 Way-4 Way-3 Way-4 Way-3 Way-4 Way-3 Way-4

Baseline Page Offset

tag Data block

v

Way-1 Way-2 Way-1 Way-2 Way-1 Way-2 Way-1 Way-2 set-1 set-N

Partition bit

Translation Filter Table (TFT)

Partition decoder

Partition-0 Partition-1

Cache

Which partition should cache- line be inserted?

slide-20
SLIDE 20

SEESAW: Cache line insertion policy

¡4way-8way

¡Superpage miss: victim within the partition ¡Basepage miss: victim within the set

¡4way

¡Uses LRU within the associated partition ¡Avoid installing the same line twice ¡Saves energy

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

20

6/26/18

slide-21
SLIDE 21

SEESAW: System Level Optimization

¡Cache coherence

¡Cache coherence lookups use physical address ¡Snoopy provide higher energy benefits over Directory based coherence

¡Page table modifications

¡Superpage splintered into multiple basepages ¡Multiple basepages promoted to superpages

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

21

6/26/18

slide-22
SLIDE 22

Outline

¡Motivation ¡SEESAW: Concept ¡SEESAW: Micro-architecture ¡Evaluation Methodology ¡Results ¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

22

6/26/18

slide-23
SLIDE 23

SEESAW: Simulated system

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

23

6/26/18

slide-24
SLIDE 24

SEESAW: Workloads

¡Spec ¡Parsec ¡Cloudsuite

¡Tunkrank

¡Biobench

¡Mummer ¡Tiger

¡MongoDB

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

24

6/26/18

¡Server Workload

¡graph500 ¡Nutch Hadoop

¡Social-event web service

¡Olia

¡Key value store

¡Redis

slide-25
SLIDE 25

Outline

¡Motivation ¡SEESAW: Concept ¡SEESAW: Micro-architecture ¡Evaluation Methodology ¡Results ¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

25

6/26/18

slide-26
SLIDE 26

SEESAW: Performance improvement

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

26

6/26/18

SEESAW observes 3-10% better runtime over baseline

slide-27
SLIDE 27

SEESAW: Performance improvement

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

27

6/26/18

Out-of-order CPU in-order CPU

~10% performance improvement for 64kB cache in OoO CPUs

slide-28
SLIDE 28

SEESAW: Energy savings

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

28

6/26/18

10-20% more energy savings over CPUs using baseline VIPT caches!

  • Approx. one-third of energy savings from coherence
slide-29
SLIDE 29

SEESAW: TFT analysis and Way-Prediction

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

29

6/26/18

TFT Analysis SEESAW + Way-prediction

16-entry TFT drives miss-rate under 10% SEESAW+WP shows symbiotic behavior

slide-30
SLIDE 30

Outline

¡Motivation ¡SEESAW: Concept ¡SEESAW: Micro-architecture ¡Evaluation Methodology ¡Results ¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

30

6/26/18

slide-31
SLIDE 31

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

31

6/26/18

Revisiting L1 Cache Characteristic

Fast lookup High hit-rate Energy Efficiency

slide-32
SLIDE 32

SEESAW: Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

32

6/26/18

Set Associativity

¡ L1 caches are optimized for latency

¡ VIPT imposes indirect restriction on number of sets in a L1 cache, increasing associativity ¡ There is non-linear relation between associativity and access latency/energy of the L1 cache

¡ Superpages are often used in modern OSes

¡ SEESAW provides low-associative access to superpages, providing both latency and energy benefits ¡ Up to 10 % performance improvement and 20 % energy reduction in modern workloads

¡ SEESAW has extremely low-overhead and is readily implementable