FSU DEPARTMENT OF COMPUTER SCIENCE Humb oldt-Universit at zu - - PowerPoint PPT Presentation

fsu
SMART_READER_LITE
LIVE PREVIEW

FSU DEPARTMENT OF COMPUTER SCIENCE Humb oldt-Universit at zu - - PowerPoint PPT Presentation

F ast Instruction Cache Analysis via Static Cache Simulation F rank Mueller David Whalley FSU DEPARTMENT OF COMPUTER SCIENCE Humb oldt-Universit at zu Berlin Flo rida State Universit y F achb ereich Info rmatik


slide-1
SLIDE 1

FSU

DEPARTMENT OF COMPUTER SCIENCE

F ast Instruction Cache Analysis via Static Cache Simulation F rank Mueller David Whalley Humb
  • ldt-Universit
  • at
zu Berlin Flo rida State Universit y F achb ereich Info rmatik Depa rtment
  • f
Computer Science Unter den Linden 6 T allahassee, FL 32304-4019 10099 Berlin (Germany) U.S.A. e-mail: whalley@cs.fsu.edu F ast Instruction Cache Analysis via Static Cache Simulation SS'95 1
slide-2
SLIDE 2

FSU

DEPARTMENT OF COMPUTER SCIENCE

Overview
  • caches
b ridge b
  • ttleneck
b et w een CPU and MM sp eed
  • traditional
(trace-driven) metho ds slo w (ab
  • ut
100x
  • verhead)
  • new,
ecient metho d fo r instruction cache simulation:
  • p
rovides faster instruction cache p erfo rmance evaluation
  • determine
numb er
  • f
hits and misses
  • f
a p rogram execution
  • used
to evaluate new cache designs
  • used
to analyze new
  • ptimizati
  • n
techniques F ast Instruction Cache Analysis via Static Cache Simulation SS'95 2
slide-3
SLIDE 3

FSU

DEPARTMENT OF COMPUTER SCIENCE

Metho ds in Contrast
  • Goal:
faster instruction cache p erfo rmance evaluation
  • traditional
app roach: inline tracing
  • instrument
p rogram
  • n
complement
  • f
min. spanning tree
  • generate
trace addresses
  • simulate
caches based
  • n
trace
  • ur
app roach:
  • n-the-y
analysis
  • analyze
p rogram statically (static cache simulation)
  • instrument
p rogram
  • n
\unique paths"
  • do
NOT generate trace addresses
  • simulate
remaining cache b ehavio r within p rogram execution F ast Instruction Cache Analysis via Static Cache Simulation SS'95 3
slide-4
SLIDE 4

FSU

DEPARTMENT OF COMPUTER SCIENCE

Static Cache Simulation
  • address
  • f
instructions kno wn statically
  • p
redicts la rge p
  • rtion
  • f
instruction cache references
  • uses
iterative analysis
  • f
call graph and control
  • w
  • catego
rizes each instruction
  • assumes:
  • direct-mapp
ed caches
  • currently
no recursion allo w ed F ast Instruction Cache Analysis via Static Cache Simulation SS'95 4
slide-5
SLIDE 5

FSU

DEPARTMENT OF COMPUTER SCIENCE

Overview
  • f
Static Cache Simulation

source control flow simulator files cache static info cache configuration linker program execut.

  • bject

files compiler files assembler assembly cache analysis library routines cache state table instrumen- tation macros

F ast Instruction Cache Analysis via Static Cache Simulation SS'95 5
slide-6
SLIDE 6

FSU

DEPARTMENT OF COMPUTER SCIENCE

Instruction Catego rization
  • transfo
rms call graph into function-instance graph (FIG)
  • p
erfo rms analysis
  • n
FIG and control-o w graph
  • uses
data-o w analysis algo rithms fo r p rediction
  • abstract
cache state: p
  • tentially
cached p rogram lines
  • reaching
state: reachable p rogram lines
  • catego
ries based
  • n
these states:
  • alw
a ys hit
  • alw
a ys miss
  • rst
miss: miss
  • n
rst reference, hit
  • n
consecutive
  • nes
  • conict:
either hit
  • r
miss (dynamic) F ast Instruction Cache Analysis via Static Cache Simulation SS'95 6
slide-7
SLIDE 7

FSU

DEPARTMENT OF COMPUTER SCIENCE

Algo rithm to Calculate Cache States input state(main):= all invalid lines; WHILE any change DO F OR each instance
  • f
a UP in the p rogram DO input state(UP):= ; F OR each immediate p redecesso r P
  • f
UP DO input state(UP):= input state(UP) [
  • utput
state(P);
  • utput
state(UP):= [input state(UP) [ p rog lines(UP)] n conf lines(UP); F ast Instruction Cache Analysis via Static Cache Simulation SS'95 7
slide-8
SLIDE 8

FSU

DEPARTMENT OF COMPUTER SCIENCE

return return pgm line 5 4 7 8 3 pgm line 3 a-hit a-hit a-hit a-hit a-hit a-hit a-hit a-hit a-hit a-hit a-hit a-miss a-miss

foo() (a) (b)

pgm line 1 pgm line 2 pgm line 4 f-miss

main()

1 a-miss a-miss conflict a-hit a-miss 2 call foo() a-hit pgm line 0 a-hit a-hit 5 f-miss a-hit f-miss 6 call foo()

F ast Instruction Cache Analysis via Static Cache Simulation SS'95 8
slide-9
SLIDE 9 F rank Mueller David Whalley SS'95
  • 4
cache lines
  • 16
b ytes p er line (4 instructions)
  • instances
fo
  • (a)
blo ck 8a and (b) blo ck 8b
  • 7(1):
alw a ys hit, spacial lo calit y
  • 8b(1):
alw a ys hit, temp
  • ral
lo calit y
  • 3(3):
rst miss
  • 5(1)
and 6(1): group rst miss
  • 3(1):
conict with 8b(2) conditionally executed F ast Instruction Cache Analysis via Static Cache Simulation (notes) 8-1
slide-10
SLIDE 10

FSU

DEPARTMENT OF COMPUTER SCIENCE

Abstract Cache States fo r Example "I" = invalid cache 1 2 3 1 2 3 1 cache ln. 1 2 3 1 2 3 1 program I I I I 1 2 3 4 5 prog. ln. I I I I 1 2 3 4 5 PASS 1
  • in(1)=[I
I I I ]
  • ut(1)=[
I I I ] in(8a)=[ I I I ]
  • ut(8a)=[
I I 4 5] in(2)=[ I I 4 5]
  • ut(2)=[
I I 1 4 ] in(3)=[ I I 1 4 ]
  • ut(3)=[
I 1 2 4 ] in(4)=[ I 1 2 4 ]
  • ut(4)=[
I 1 2 4 ] in(5)=[ I 1 2 4 ]
  • ut(5)=[
1 2 3 4 ] in(8b)=[ 1 2 3 4 ]
  • ut(8b)=[
2 3 4 5] in(6)=[ I 1 2 3 4 5]
  • ut(6)=[
1 2 3 4 5] in(7)=[ 1 2 3 4 5]
  • ut(7)=[
1 2 3 4 5] PASS 2
  • in(1)=[I
I I I ]
  • ut(1)=[
I I I ] in(8a)=[ I I I ]
  • ut(8a)=[
I I 4 5] in(2)=[ I I 4 5]
  • ut(2)=[
I I 1 4 ] in(3)=[ I I 1 2 3 4 5]
  • ut(3)=[
I 1 2 3 4 ] in(4)=[ I 1 2 3 4 ]
  • ut(4)=[
I 1 2 3 4 ] in(5)=[ I 1 2 3 4 ]
  • ut(5)=[
1 2 3 4 ] in(8b)=[ 1 2 3 4 ]
  • ut(8b)=[
2 3 4 5] in(6)=[ I 1 2 3 4 5]
  • ut(6)=[
1 2 3 4 5] in(7)=[ 1 2 3 4 5]
  • ut(7)=[
1 2 3 4 5] F ast Instruction Cache Analysis via Static Cache Simulation SS'95 9
slide-11
SLIDE 11

FSU

DEPARTMENT OF COMPUTER SCIENCE

Co de Instrumentation
  • merging
states: lo cal path state, sha red path state (SPS)
  • states
p rovide DF A to simulate conicts lo cally
  • frequency
counters
  • macros
fo r calls
  • macros
fo r paths
  • rst
miss table
  • calculate
hits and misses from frequencies and states F ast Instruction Cache Analysis via Static Cache Simulation SS'95 10
slide-12
SLIDE 12

FSU

DEPARTMENT OF COMPUTER SCIENCE

SPS (path 1 and 2)

0 1 : hit a, miss b 1 1 : miss a, miss b 1 0 : miss a, hit b 0 0 : hit a, hit b

1 4 5 2 3 6 I-Cache

cache line c

7 path 4

cache line d sps&=~0x3

path 1

pgm line a pgm line b pgm line x pgm line y

path 3 path 2

sps|=0x3 freq[sps]++ freq[sps]++ sps|=0x2 sps&=~0x1

F ast Instruction Cache Analysis via Static Cache Simulation SS'95 11
slide-13
SLIDE 13

FSU

DEPARTMENT OF COMPUTER SCIENCE

Measurements
  • mo
died back-end
  • f
  • pt.
compiler VPO
  • p
erfo rmed static cache simulation
  • instrumented
p rograms fo r instruction cache simulation
  • direct-mapp
ed cache simulated
  • unifo
rm instruction size
  • f
4 b ytes simulated
  • cache
line size w as 4 w
  • rds
(16 b ytes)
  • results
veried b y compa rison against trace-driven simulation F ast Instruction Cache Analysis via Static Cache Simulation SS'95 12
slide-14
SLIDE 14

FSU

DEPARTMENT OF COMPUTER SCIENCE

P erfo rmance Evaluation
  • UPP
As and function instances vs. basic blo ck pa rtitioning
  • static
savings: 24% few er measurement p
  • ints
  • dynamic
savings: 31% few er measurement p
  • ints
  • p
redictabilit y
  • f
instructions
  • static:
16% conicts,
  • ther
84% p redicatble
  • dynamic:
26% conicts,
  • ther
74% p redictable
  • ecient
in-line co de instrumentation accounts fo r remaining savings
  • trace-driven
  • verhead
18x,
  • ur
metho d
  • nly
2x F ast Instruction Cache Analysis via Static Cache Simulation SS'95 13
slide-15
SLIDE 15

FSU

DEPARTMENT OF COMPUTER SCIENCE

Static Measurements fo r 1kB Direct-Mapp ed Cache Name Hit Miss Firstmiss Conict Measure Pts. cachesim 70.83% 6.99% 0.70% 21.48% 73.38% cb 79.03% 2.35% 0.00% 18.63% 89.62% compact 70.12% 4.96% 0.12% 24.80% 68.89% copt 70.89% 7.41% 7.03% 14.67% 84.19% dhrystone 70.03% 10.71% 7.30% 11.96% 81.61% t 74.07% 4.85% 16.42% 4.66% 78.43% genrep
  • rt
70.61% 9.95% 5.61% 13.84% 71.58% mincost 72.79% 9.96% 1.14% 16.11% 83.19% sched 67.65% 5.06% 0.09% 27.19% 73.16% sdi 68.94% 12.06% 0.89% 18.11% 72.13% tsp 72.61% 13.50% 3.88% 10.01% 64.08% whetstone 75.70% 12.84% 0.24% 11.22% 70.49% average 71.94% 8.39% 3.62% 16.06% 75.90% F ast Instruction Cache Analysis via Static Cache Simulation SS'95 14
slide-16
SLIDE 16

FSU

DEPARTMENT OF COMPUTER SCIENCE

Dynamic Measurements fo r 1kB Direct-Mapp ed Cache Name Measure Pts. Hit Ratio T race SSim Conict cachesim 60.56% 77.19% 8.41 1.53 34.12% cb 65.61% 93.84% 33.56 3.51 30.67% compact 56.56% 92.90% 22.29 2.31 21.34% copt 74.88% 93.64% 16.43 1.58 30.00% dhrystone 72.73% 83.73% 19.89 1.31 16.01% t 74.08% 99.95% 5.79 0.95 8.80% genrep
  • rt
81.31% 97.45% 13.57 1.91 28.92% mincost 76.27% 89.08% 23.47 2.23 30.67% sched 58.29% 96.41% 25.90 3.62 42.01% sdi 77.82% 97.61% 32.10 3.99 28.40% tsp 58.67% 86.98% 5.70 1.19 17.63% whetstone 68.25% 100.00% 13.44 1.36 23.56% average 68.75% 92.40% 18.38 2.12 26.01% F ast Instruction Cache Analysis via Static Cache Simulation SS'95 15
slide-17
SLIDE 17

FSU

DEPARTMENT OF COMPUTER SCIENCE

Average Simulation Overhead

5 10 15 20 25 64 128 256 512 1k 2k 4k 8k Execution Overhead Cache Size [Bytes] SSsim Trace

F ast Instruction Cache Analysis via Static Cache Simulation SS'95 16
slide-18
SLIDE 18

FSU

DEPARTMENT OF COMPUTER SCIENCE

F uture W
  • rk
  • recursion
  • set-asso
ciative caches
  • data
caching
  • integrate
with timing to
  • l
to tightly p redict WET/BET
  • ther
applications F ast Instruction Cache Analysis via Static Cache Simulation SS'95 17
slide-19
SLIDE 19

FSU

DEPARTMENT OF COMPUTER SCIENCE

Summa ry
  • uses
ecient
  • n-the-y
analysis
  • p
erfo rms static instruction cache simulation
  • instruments
p rogram
  • p
rovides accurate cache p erfo rmance measurements
  • instrumented
p rogram has
  • nly
ab
  • ut
2x execution
  • verhead
  • faster
than any
  • ther
cache analysis metho d published so fa r F ast Instruction Cache Analysis via Static Cache Simulation SS'95 18