FSU Depa rtment of Computer DEPARTMENT OF COMPUTER SCIENCE - - PowerPoint PPT Presentation

fsu
SMART_READER_LITE
LIVE PREVIEW

FSU Depa rtment of Computer DEPARTMENT OF COMPUTER SCIENCE - - PowerPoint PPT Presentation

Predicting Instruction Cache Behavio r F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) FSU Depa rtment of Computer DEPARTMENT OF COMPUTER SCIENCE Science Flo rida State Universit y T allahassee, FL


slide-1
SLIDE 1

FSU

DEPARTMENT OF COMPUTER SCIENCE

Predicting Instruction Cache Behavio r F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) Depa rtment
  • f
Computer Science Flo rida State Universit y T allahassee, FL 32304-4019 e-mail: mueller@cs.fsu.edu whalley@cs.fsu.edu ha rmon@vm.cc.famu.edu Predicting Instruction Cache Behavio r LCTS'94 1
slide-2
SLIDE 2

FSU

DEPARTMENT OF COMPUTER SCIENCE

Overview
  • caches
  • ften
disabled fo r real-time due to \unp redictabilit y"
  • analysis
  • f
instruction cache b ehavio r p
  • ssible
  • static
cache simulation p redicts many references
  • allo
ws tighter WET/BET p redictions fo r regula r caches
  • new
a rchitectural feature: fetch-from-memo ry bit
  • sp
eedup facto r 3 to 8
  • ver
uncached system
  • no
loss in p redictabilit y Predicting Instruction Cache Behavio r LCTS'94 2
slide-3
SLIDE 3

FSU

DEPARTMENT OF COMPUTER SCIENCE

Intro duction
  • timing
p redictions required fo r schedulabilit y analysis
  • caches
b ridge b
  • ttleneck
b et w een CPU and MM sp eed
  • caches
rega rded as \unp redictable"
  • caches
  • ften
disabled fo r ha rd real-time systems
  • CPU
sp eed not fully utilized
  • p
roblem will increase in future Predicting Instruction Cache Behavio r LCTS'94 3
slide-4
SLIDE 4

FSU

DEPARTMENT OF COMPUTER SCIENCE

Static Cache Simulation
  • address
  • f
instructions kno wn statically
  • p
redicts la rge p
  • rtion
  • f
instruction cache references
  • uses
data-o w analysis
  • f
call graph and control
  • w
  • catego
rizes each instruction
  • assumes:
  • direct-mapp
ed caches
  • task:
co de executed b et w een 2 scheduling p
  • ints
  • non-p
reemptive static scheduling
  • currently
no recursion allo w ed Predicting Instruction Cache Behavio r LCTS'94 4
slide-5
SLIDE 5

FSU

DEPARTMENT OF COMPUTER SCIENCE

Overview
  • f
Static Cache Simulation

source control flow files cache static cache configuration compiler linker

  • bject

files simulator executable program information instruction annotation

Predicting Instruction Cache Behavio r LCTS'94 5
slide-6
SLIDE 6

FSU

DEPARTMENT OF COMPUTER SCIENCE

Instruction Catego rization
  • transfo
rms call graph into function-instance graph (FIG)
  • p
erfo rms analysis
  • n
FIG and control-o w graph
  • uses
data-o w analysis algo rithms fo r p rediction
  • abstract
cache state: p
  • tentially
cached p rogram lines
  • reaching
state: reachable p rogram lines
  • catego
ries based
  • n
these states:
  • alw
a ys hit
  • alw
a ys miss
  • rst
miss: miss
  • n
rst reference, hit
  • n
consecutive
  • nes
  • conict:
either hit
  • r
miss (dynamic) Predicting Instruction Cache Behavio r LCTS'94 6
slide-7
SLIDE 7

FSU

DEPARTMENT OF COMPUTER SCIENCE

return return program line 5 4 7 8 3 program line 3 a-hit a-hit a-hit a-hit a-hit a-hit a-hit a-hit a-hit a-hit a-hit a-miss a-miss

foo() (a) (b)

program line 1 program line 2 program line 4 f-miss

main()

1 a-miss a-miss conflict a-hit a-miss 2 call foo() a-hit program line 0 a-hit a-hit 5 f-miss a-hit f-miss 6 call foo()

Predicting Instruction Cache Behavio r LCTS'94 7
slide-8
SLIDE 8 F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) LCTS'94
  • 4
cache lines
  • 16
b ytes p er line (4 instructions)
  • instances
fo
  • (a)
blo ck 8a and (b) blo ck 8b
  • 7(1):
alw a ys hit, spacial lo calit y
  • 8b(1):
alw a ys hit, temp
  • ral
lo calit y
  • 3(3):
rst miss
  • 5(1)
and 6(1): group rst miss
  • 3(1):
conict with 8b(2) conditionally executed Predicting Instruction Cache Behavio r (notes) 7-1
slide-9
SLIDE 9

FSU

DEPARTMENT OF COMPUTER SCIENCE

F etch-F rom-Memo ry Bit
  • motivation:
  • b
etter p erfo rmance than uncached systems
  • no
loss
  • f
p redictabilit y
  • fetch-from-memo
ry (FFM) bit enco ded in instruction
  • semantics:
  • FFM
set: fetch instruction from MM
  • FFM
clea r: fetch instruction from cache Predicting Instruction Cache Behavio r LCTS'94 8
slide-10
SLIDE 10

FSU

DEPARTMENT OF COMPUTER SCIENCE

F etch-F rom-Memo ry Bit (cont.)
  • ha
rdw a re logic:
  • cache
miss: fetch from memo ry (n cycle dela y)
  • cache
hit and FFM set: fetch from memo ry (n cycle dela y)
  • cache
hit and FFM clea r: fetch from cache without dela y
  • relation
to instruction catego rization:
  • FFM
set i conict
  • r
alw a ys miss
  • FFM
clea r i rst miss
  • r
alw a ys hit
  • rst
miss:
  • 1st
reference results in cache miss (n cycle dela y)
  • consecutive
references result in cache hit and FFM clea r (no dela y) Predicting Instruction Cache Behavio r LCTS'94 9
slide-11
SLIDE 11

FSU

DEPARTMENT OF COMPUTER SCIENCE

Measurements
  • mo
died back-end
  • f
  • pt.
compiler VPO
  • p
erfo rmed static cache simulation
  • instrumented
p rograms fo r instruction cache simulation
  • direct-mapp
ed cache simulated
  • unifo
rm instruction size
  • f
4 b ytes simulated
  • cache
line size w as 4 w
  • rds
(16 b ytes) Predicting Instruction Cache Behavio r LCTS'94 10
slide-12
SLIDE 12

FSU

DEPARTMENT OF COMPUTER SCIENCE

Static Measurements Cache cache p rediction Size FFM set alw a ys hit alw a ys miss rst-miss conict 1kB 25.19% 71.23% 8.66% 3.69% 16.42% 2kB 21.18% 72.09% 5.88% 7.28% 14.75% 4kB 11.35% 72.40% 4.36% 16.64% 6.60% 8kB 4.73% 72.61% 4.03% 22.77% 0.59% Predicting Instruction Cache Behavio r LCTS'94 11
slide-13
SLIDE 13 F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) LCTS'94
  • cache
sizes 1-4kB
  • 12
p rograms with sizes 5-18kB
  • FFM
set: in D A G call graph and CF G
  • thers:
in FIG
  • caches
statically p redictable fo r 84-99%
  • f
references
  • remaining
1-16% due to conicts Predicting Instruction Cache Behavio r (notes) 11-1
slide-14
SLIDE 14

FSU

DEPARTMENT OF COMPUTER SCIENCE

Dynamic Measurements Cache hit ratio conicts %
  • f
exec time Size bit-enc. cached cached bit-enc. cached 1kB 71.81% 92.40% 25.38% 39.30% 18.71% 2kB 77.81% 97.49% 21.14% 33.30% 13.62% 4kB 90.73% 99.74% 9.12% 20.38% 11.37% 8kB 98.15% 99.99% 1.76% 12.97% 11.13% Predicting Instruction Cache Behavio r LCTS'94 12
slide-15
SLIDE 15 F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) LCTS'94
  • uncached:
simulated disabled instruction cache with 10
  • verhead
fo r each instruction fetch
  • bit-enco
ded: simulated translation
  • f
bit-enco ding as discussed
  • conventional
cached
  • 1-19
million instructions executed
  • results
imp rove with increasing cache size
  • bit-enc.:
lo w er hit ratio than cached (72-98% vs. 92-99%) but much b etter than uncached!
  • bit-enc.:
3-8 times faster than uncached (39-13%
  • f
uncached exec time)!
  • cached:
5-9 times faster than uncached (18-11%
  • f
uncached exec time)
  • cached
less p redictable, bit-enc. as p redictable as uncached!
  • conicts
source
  • f
unp redictabilit y , 25-5%
  • results
can b e still imp roved if combined with timing to
  • l
(4-9 sp eedup)
  • very
tight estimating
  • f
regula r cached system p
  • ssible
with timing to
  • l
Predicting Instruction Cache Behavio r (notes) 12-1
slide-16
SLIDE 16

FSU

DEPARTMENT OF COMPUTER SCIENCE

Prelimina ry Timing Results Dynamic W
  • rst-Case
Measurements Name Observed Our Esti- Naive Cycles mated Ratio Ratio Matmult 2,917,887 1.00 9.21 Matsum 677,204 1.00 4.63 Matsumcnt 959,064 1.09 4.31 Bubbleso rt 7,620,684 1.99 8.18 Predicting Instruction Cache Behavio r LCTS'94 13
slide-17
SLIDE 17 F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) LCTS'94
  • 8
lines
  • f
16 b ytes each, i.e. cache size is 128 b ytes
  • p
rograms 4-6 times la rger than cache
  • bserved:
simulated cached system
  • ur
estimate: timing to
  • l
  • naive:
uncached system (10 cycles fetch dela y p er instruction)
  • matmult:
lo
  • ps,
no if-then-else
  • matsum:
lo
  • ps,
if-then
  • matsumcnt:
lo
  • ps,
if-the-else
  • bubbleso
rt: inner lo
  • p
counters dep ending
  • n
  • uter
lo
  • p
counters
  • general
p roblem fo r timing to
  • ls
(kno wn numb er
  • f
lo
  • p
iterations)
  • surp
risingly tight estimates p
  • ssible,
just as go
  • d
as without caches Predicting Instruction Cache Behavio r (notes) 13-1
slide-18
SLIDE 18

FSU

DEPARTMENT OF COMPUTER SCIENCE

F uture W
  • rk
  • data
caching
  • recursion
  • set-asso
ciative caches
  • integrate
with timing to
  • l
to tightly p redict WET/BET
  • ther
applications Predicting Instruction Cache Behavio r LCTS'94 14
slide-19
SLIDE 19

FSU

DEPARTMENT OF COMPUTER SCIENCE

Related W
  • rk
  • very
littl e w
  • rk
  • n
p redicting cache b ehavio r
  • b
elieved to b e \unp redictable", to
  • complex
to analyze
  • timing
to
  • ls
at dierent co de levels:
  • source:
P a rk et. al. (U.W. / Seoul), no caching
  • intermediate:
Niehaus et. al. (Amherst), no caching
  • machine:
Ha rmon et. al. (FSU / F AMU), limi ted caching
  • Niehaus:
estimated cache hits at abstract level, no metho d
  • a
rchitectural mo dications b y Kirk through cache segmenting
  • bit-enco
ding:
  • McF
a rling: excl. instr. from cache
  • Chi
and Dietz: selected data caching (cache xo r register) Predicting Instruction Cache Behavio r LCTS'94 15
slide-20
SLIDE 20

FSU

DEPARTMENT OF COMPUTER SCIENCE

Summa ry
  • p
redicted instruction cache b ehavio r successfully
  • designed
and implemented static cache simulato r to do the job
  • regula
r caches: many references statically kno wn (84-99%)
  • bit-enco
ding: all references p redictable, 3-8 times faster than uncached
  • tight
estimates
  • f
WET/BET p
  • ssible
(with timing to
  • l)
  • results
sucient fo r schedulabilit y analysis INSTRUCTION CA CHES CAN FINALL Y BE ENABLED F OR HARD REAL-TIME SYSTEMS Predicting Instruction Cache Behavio r LCTS'94 16