GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: - - PowerPoint PPT Presentation

glimpses glimpses memory and program behavior glimpses
SMART_READER_LITE
LIVE PREVIEW

GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: - - PowerPoint PPT Presentation

GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: GLIMPSES: Memory and program behavior estimation for SPEs Jaswanth Sreeram, Ling Liu, Santosh Pande Motivation Prototyping large codebases for porting to Prototyping large


slide-1
SLIDE 1

GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: GLIMPSES: Memory and program behavior estimation for SPEs

Jaswanth Sreeram, Ling Liu, Santosh Pande

slide-2
SLIDE 2

Motivation

“Prototyping large codebases for porting to “Prototyping large codebases for porting to SPEs is challenging” SPEs is challenging” “Need “Need a way to quickly evaluate a way to quickly evaluate program program behavior and its suitability for SPEs behavior and its suitability for SPEs” behavior and its suitability for SPEs behavior and its suitability for SPEs – Important for legacy code/reuse Important for legacy code/reuse

2

slide-3
SLIDE 3

Motivation (contd)

  • Porting large codebases to SPEs is challenging

Limited local store – Limited local store – High branch penalty – Geared towards vectorizable code – Code/data partitioning is not trivial – SPE-SPE, SPE-PPE interactions

  • Provide programmer with tools to

Understand dynamic program behavior – Understand dynamic program behavior – Quickly construct candidate partitions for SPEs – Evaluate/Quantify partitions’ suitability for SPEs

3

slide-4
SLIDE 4

GLIMPSES: Tool Overview

  • Dynamic Call Graphs
  • Memory Requirements

– Dynamic A l ti l – Analytical

  • Memory Access Patterns

– Locality (spatial, temporal, neighbor affinity)

  • Partitioning

a t t o g

– Criteria based estimates

  • Visual interactive

4

  • Visual, interactive
slide-5
SLIDE 5

Dynamic Call Chains

Graph Visualization Area Results Display Panel

5

slide-6
SLIDE 6

Call chains…contd

6

slide-7
SLIDE 7

Mpeg-2 Decode

  • Zoom view
  • Shows dynamic

call chains for a program run (in this case the program is mpeg2-decode)

7

slide-8
SLIDE 8

GLIMPSES

C/C++ program

  • Dyn. Memory Estimator

Analysis & LLVM compiler flow Bytecode Analytical Memory Estimator Partition Estimator Visualization Engine Analysis & Instrumentation Passes

  • Instru. Bytecode

Link Runtime GraphML Trace Test Inputs Execute Profile Trace

8

slide-9
SLIDE 9

Memory Behavior

  • Estimate static and dyn. memory usage

– Code, stack and heap (per function) , p (p ) – Usage < SPE LS limit ?

  • Estimate function attributes
  • Estimate function attributes

– Branch density – Number of Auto-vectorizable loops

  • Analytical estimation

– Detect program objects affecting dynamic memory behavior – Show correlation between these program objects and memory usage.

  • Construct an arithmetic expression for amount of memory allocated, in

terms of inputs or other program objects

9

slide-10
SLIDE 10

Analytical Estimator : Mpeg2 example

for (……) {

__Malloc_size__1 = 1024

Result Code segment { if (cc==0) size = Picture_Width*Picture_Height; else

__Malloc_size__2 = 0 + Coded_Picture_Width*Coded_Picture_Height Malloc size 3 = 0 +

size = Chroma_Width*Chroma_Height; if (!(backward_reference_frame[cc] = (unsigned char *) malloc(size) ))

__Malloc_size__3 0 + Coded_Picture_Width*Coded_Picture_Height __Malloc_size__4 = 0 + Coded_Picture_Width*Coded_Picture_Height

(unsigned char *) malloc(size) )) Error(…); if (!(forward_reference_frame[cc] =

__Malloc_size__5 = 0 + Chroma_Width*Chroma_Height Malloc size 6 = 0 +

(unsigned char *) malloc(size) )) Error(…); }

__ _ __ Chroma_Width*Chroma_Height __Malloc_size__7 = 0 + Chroma_Width*Chroma_Height

10

__Malloc_size__8 = 0 + Chroma_Width*Chroma_Height

slide-11
SLIDE 11

Memory Access Patterns

  • Locality metrics for loads/stores

– Spatial Locality

  • “Loads to different addresses in a spatial window”

– Temporal Locality

“L d dd i i i d ”

  • “Loads to same address in a time window”

Neighbor Affinity – Neighbor Affinity

  • “Loads to addresses within a space and time window”

11

slide-12
SLIDE 12

Localit Locality Measures Measures (mpeg2decode)

Memory Access Locality: Recurrence of Loads Recurrence of Loads

1000 1200 600 800 1000

  • f loads

200 400 600 Number o 200 1 6 11 16 21 26 31 36 41 46 51 56 61 Number of times recurred

12

Number of times recurred

slide-13
SLIDE 13

Locality measures: Affinity

Locality: Neighbor Affinity

7000 8000 4000 5000 6000 r of Loads 2000 3000 Number 1000 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 "NA" V l

13

"NA" Values

slide-14
SLIDE 14

Program Partitions

  • Provide programmer with possible partition

candidates candidates

– Can be based on criteria:

  • Memory consumption
  • Memory consumption
  • Memory reference behavior
  • Branch density
  • Auto-vectorizable loops
  • Aliasing
  • Combination (a “rank” metric)
  • Combination (a “rank” metric)

– Does not assume code/data overlays

14

Does not assume code/data overlays

slide-15
SLIDE 15

Partitioning

  • Start with earliest leaf node in dyn. call graph

in a partition

  • Try to add its parent to the partition
  • Estimates only: No code generation
  • Programmer to take care of “cloning”.
  • Can produce interprocedural, context sensitive
  • Try to add all of parent’s children to the

partition

  • If they can be added, try to add parent to

partition. Can produce interprocedural, context sensitive alias information.

  • Given two partitions, can they alias each
  • ther’s data ?

15

p

  • Try to add parent’s parent to partition
slide-16
SLIDE 16

Status

  • Several features/improvements planned

– Alias Analysis information for refining partition-set Alias Analysis information for refining partition set – Alias Analysis information for data pinning/prefetching

  • pportunities.

– Leverage DataStructureAnalyses for smart memory – Leverage DataStructureAnalyses for smart memory allocation on SPUs

  • Tested on
  • Tested on

– Workloads from SPECINT – Workloads from mediabench – ODE (Open Dynamics Engine)

  • Beta version to be released shortly

16

  • Beta version to be released shortly.
slide-17
SLIDE 17

The End

Email contact: jaswanth@cc.gatech.edu

17