GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: - - PowerPoint PPT Presentation
GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: - - PowerPoint PPT Presentation
GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: GLIMPSES: Memory and program behavior estimation for SPEs Jaswanth Sreeram, Ling Liu, Santosh Pande Motivation Prototyping large codebases for porting to Prototyping large
Motivation
“Prototyping large codebases for porting to “Prototyping large codebases for porting to SPEs is challenging” SPEs is challenging” “Need “Need a way to quickly evaluate a way to quickly evaluate program program behavior and its suitability for SPEs behavior and its suitability for SPEs” behavior and its suitability for SPEs behavior and its suitability for SPEs – Important for legacy code/reuse Important for legacy code/reuse
2
Motivation (contd)
- Porting large codebases to SPEs is challenging
Limited local store – Limited local store – High branch penalty – Geared towards vectorizable code – Code/data partitioning is not trivial – SPE-SPE, SPE-PPE interactions
- Provide programmer with tools to
Understand dynamic program behavior – Understand dynamic program behavior – Quickly construct candidate partitions for SPEs – Evaluate/Quantify partitions’ suitability for SPEs
3
GLIMPSES: Tool Overview
- Dynamic Call Graphs
- Memory Requirements
– Dynamic A l ti l – Analytical
- Memory Access Patterns
– Locality (spatial, temporal, neighbor affinity)
- Partitioning
a t t o g
– Criteria based estimates
- Visual interactive
4
- Visual, interactive
Dynamic Call Chains
Graph Visualization Area Results Display Panel
5
Call chains…contd
6
Mpeg-2 Decode
- Zoom view
- Shows dynamic
call chains for a program run (in this case the program is mpeg2-decode)
7
GLIMPSES
C/C++ program
- Dyn. Memory Estimator
Analysis & LLVM compiler flow Bytecode Analytical Memory Estimator Partition Estimator Visualization Engine Analysis & Instrumentation Passes
- Instru. Bytecode
Link Runtime GraphML Trace Test Inputs Execute Profile Trace
8
Memory Behavior
- Estimate static and dyn. memory usage
– Code, stack and heap (per function) , p (p ) – Usage < SPE LS limit ?
- Estimate function attributes
- Estimate function attributes
– Branch density – Number of Auto-vectorizable loops
- Analytical estimation
– Detect program objects affecting dynamic memory behavior – Show correlation between these program objects and memory usage.
- Construct an arithmetic expression for amount of memory allocated, in
terms of inputs or other program objects
9
Analytical Estimator : Mpeg2 example
for (……) {
__Malloc_size__1 = 1024
Result Code segment { if (cc==0) size = Picture_Width*Picture_Height; else
__Malloc_size__2 = 0 + Coded_Picture_Width*Coded_Picture_Height Malloc size 3 = 0 +
size = Chroma_Width*Chroma_Height; if (!(backward_reference_frame[cc] = (unsigned char *) malloc(size) ))
__Malloc_size__3 0 + Coded_Picture_Width*Coded_Picture_Height __Malloc_size__4 = 0 + Coded_Picture_Width*Coded_Picture_Height
(unsigned char *) malloc(size) )) Error(…); if (!(forward_reference_frame[cc] =
__Malloc_size__5 = 0 + Chroma_Width*Chroma_Height Malloc size 6 = 0 +
(unsigned char *) malloc(size) )) Error(…); }
__ _ __ Chroma_Width*Chroma_Height __Malloc_size__7 = 0 + Chroma_Width*Chroma_Height
10
__Malloc_size__8 = 0 + Chroma_Width*Chroma_Height
Memory Access Patterns
- Locality metrics for loads/stores
– Spatial Locality
- “Loads to different addresses in a spatial window”
– Temporal Locality
“L d dd i i i d ”
- “Loads to same address in a time window”
Neighbor Affinity – Neighbor Affinity
- “Loads to addresses within a space and time window”
11
Localit Locality Measures Measures (mpeg2decode)
Memory Access Locality: Recurrence of Loads Recurrence of Loads
1000 1200 600 800 1000
- f loads
200 400 600 Number o 200 1 6 11 16 21 26 31 36 41 46 51 56 61 Number of times recurred
12
Number of times recurred
Locality measures: Affinity
Locality: Neighbor Affinity
7000 8000 4000 5000 6000 r of Loads 2000 3000 Number 1000 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 "NA" V l
13
"NA" Values
Program Partitions
- Provide programmer with possible partition
candidates candidates
– Can be based on criteria:
- Memory consumption
- Memory consumption
- Memory reference behavior
- Branch density
- Auto-vectorizable loops
- Aliasing
- Combination (a “rank” metric)
- Combination (a “rank” metric)
– Does not assume code/data overlays
14
Does not assume code/data overlays
Partitioning
- Start with earliest leaf node in dyn. call graph
in a partition
- Try to add its parent to the partition
- Estimates only: No code generation
- Programmer to take care of “cloning”.
- Can produce interprocedural, context sensitive
- Try to add all of parent’s children to the
partition
- If they can be added, try to add parent to
partition. Can produce interprocedural, context sensitive alias information.
- Given two partitions, can they alias each
- ther’s data ?
15
p
- Try to add parent’s parent to partition
Status
- Several features/improvements planned
– Alias Analysis information for refining partition-set Alias Analysis information for refining partition set – Alias Analysis information for data pinning/prefetching
- pportunities.
– Leverage DataStructureAnalyses for smart memory – Leverage DataStructureAnalyses for smart memory allocation on SPUs
- Tested on
- Tested on
– Workloads from SPECINT – Workloads from mediabench – ODE (Open Dynamics Engine)
- Beta version to be released shortly
16
- Beta version to be released shortly.
The End
Email contact: jaswanth@cc.gatech.edu
17