[PPT] - TEMPO, A Program Specializer for C Renaud MARLET Compose group PowerPoint Presentation

SLIDE 1

Dynamo '00 1

TEMPO, A Program Specializer for C

Renaud MARLET Compose group IRISA / INRIA Rennes (France)

SLIDE 2

Dynamo '00 2

What it is / What it does

 Automatic compile-time and run-time specialization  Program and data specialization  Modular specialization  Incremental specialization  Real-size applications (~ 6,000 specialized lines) Q  Back-end partial evaluator for Java (Jspec)  Publicly available (~ 40 licenses)

SLIDE 3

Dynamo '00 3

Some Applications of Tempo

 Operating systems [PEPM’97, ICDCS’97]

 Sun RPC (3.7x), Chorus IPC (1.5x), BPF (4x)

 Numerical computations [LNCS, ICCL’98, PEPM’99]

 FFT (4–12x), standard library routines

 Computer graphics [ECOOP’99]

 Convolution filters (4x)

 Software architectures [ASE’97]

 Selective broadcast, software layers, generic libraries, …

 Compilers/JITs for interpreters [DSL’97, SRDS’98, ICDCS’99]

 PLAN-P (80x, 96% of C throughput), O’Caml (1.2–2.5x) …

SLIDE 4

Dynamo '00 4

Overview

Analysis C source Concrete specialization context Compile-time specializer Specialized source Run-time specializer generator Run-time specializer Specialized binary Compile-time specializer generator Behavior

f external

functions C source annotated with specialization actions Abstract specialization context

SLIDE 5

Dynamo '00 5

T1 T3 T2 T2 T2 H1 H2 T1 T3 T2

dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; }

Specialization Templates

dotprod_size_u(v[]) { res = 0; res += 7 * v[0]; res += 4 * v[1]; res += 6 * v[2]; return res; } size=3 u[]={7,4,6} dotprod(size,u[],v[]) dotprod_size_u(v[])

T1 T2[ 7 , 0 ] T2[ 4 , 1 ] T2[ 6 , 2 ] T3

Stages: S D dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; }

SLIDE 6

Dynamo '00 6

buf

Dedicated Run-Time Specializer

dotprod_spec(size,u[]) { buf = alloc(); copy_temp(buf,T1); for(i = 0; i < size; i++) { copy_temp(buf,T2); fill_hole(buf,H1,u[i]); fill_hole(buf,H2,i); } copy_temp(buf,T3); return buf; } T1 T2[ u[0] , 0 ] T3 T2[ u[1] , 1 ] T2[ u[2] , 2 ]

Stages: S D Code generation instructions:

H1 H2 T1 T3 T2

dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; } dotprod(size,u[],v[]) dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; }

SLIDE 7

Dynamo '00 7

Tentative Balance-Sheet for Tempo (1994 – 1999)

Pros  Automation, safety  Non-intrusiveness  Accurate analyses Q  Predictability  Low break-even point  Easy engineering

 AST, compiler re-use

 Realistic applications  Framework for CT/RT Cons  Complex declarations  Slicing & re-plugging  Fixed precision  A posteriori control  Code less optimized  Limitations

 BT precision, optimisation

 Prototype

SLIDE 8

Dynamo '00 8

Precision of the Analyses

[PEPM’97, SAS’97, TCS’00]

Analyses Alias Binding time Interprocedural   Flow-sensitive   Context-sensitive

n-going work

 Return-sensitive

N.A.

 Use-sensitive

N.A.

 Field-sensitive

per struct type per struct type (or instance) (or instance)

SLIDE 9

Dynamo '00 9

Challenges?

 Detecting specialization opportunities:

 Existing code already hand-optimized  Little hope

SLIDE 10

Dynamo '00 10

Challenges

 Architecturing software for specialization

 Development methodology  More quantitative prediction

 Declaring specialization

 More automation: no slicing and plugging (guards)  Less inference, more checking: downgrade Tempo

 Make the technology usable by humans

SLIDE 11

Dynamo '00 11

Extra slides

SLIDE 12

Dynamo '00 12

Making Templates

Stages: S D

H1 H2 T1 T3 T2

dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; } dotprod(size,u[],v[]) dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; } dotprod(v[]) { res = 0; res += * v[ ]; return res; }

Re-use existing compiler
Symbol table
Original control flow
Prevent inter-template code motion

/* T1_start: */ T1_end: T2_start: &h1 &h2 T2_end: T3_start: /* T3_end: */ /* T1_start: */ T1_end: while( dummy ){ T2_start: &h1 &h2 T2_end: } T3_start: /* T3_end: */

Re-use existing compiler
Symbol table
Original control flow
Re-use existing compiler
Re-use existing compiler
Symbol table

SLIDE 13

Dynamo '00 13

Generating The Run-Time Specializer

specialization actions Templates (.c) Templates (.o) Templates description Template offsets Code generator (.c) Code generator (.o) Dedicated run-time specializer (.o)

Start & end template marks: labels Holes: ptr to global variables ld gcc gcc tcc

bjdump

bfd Symbol table Inter-template jumps Templates Holes + peep-hole optimisations + inlining (register usage)

SLIDE 14

Dynamo '00 14

Run-Time Specialization: Implementation

 Compilers: gcc, lcc  Machines: Sparc, Pentium  Main run-time cost: copying instructions  Little inter-template optimizations  Run-time inlining

SLIDE 15

Dynamo '00 15

Run-Time Specialization: Experimental Results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Romberg integration Cubic spline Chebyshev approximation Dithering FFT

Original RT-specialized CT-specialized

Applications Time (normalized)

CT-specialized compiled with

ptimizations

⇒ “optimal”