Dynamo '00 1
TEMPO, A Program Specializer for C Renaud MARLET Compose group - - PowerPoint PPT Presentation
TEMPO, A Program Specializer for C Renaud MARLET Compose group - - PowerPoint PPT Presentation
TEMPO, A Program Specializer for C Renaud MARLET Compose group IRISA / INRIA Rennes (France) Dynamo '00 1 What it is / What it does Automatic compile-time and run-time specialization Program and data specialization Modular
Dynamo '00 2
What it is / What it does
Automatic compile-time and run-time specialization Program and data specialization Modular specialization Incremental specialization Real-size applications (~ 6,000 specialized lines) Q Back-end partial evaluator for Java (Jspec) Publicly available (~ 40 licenses)
Dynamo '00 3
Some Applications of Tempo
Operating systems [PEPM’97, ICDCS’97]
Sun RPC (3.7x), Chorus IPC (1.5x), BPF (4x)
Numerical computations [LNCS, ICCL’98, PEPM’99]
FFT (4–12x), standard library routines
Computer graphics [ECOOP’99]
Convolution filters (4x)
Software architectures [ASE’97]
Selective broadcast, software layers, generic libraries, …
Compilers/JITs for interpreters [DSL’97, SRDS’98, ICDCS’99]
PLAN-P (80x, 96% of C throughput), O’Caml (1.2–2.5x) …
Dynamo '00 4
Overview
Analysis C source Concrete specialization context Compile-time specializer Specialized source Run-time specializer generator Run-time specializer Specialized binary Compile-time specializer generator Behavior
- f external
functions C source annotated with specialization actions Abstract specialization context
Dynamo '00 5
T1 T3 T2 T2 T2 H1 H2 T1 T3 T2
dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; }
Specialization Templates
dotprod_size_u(v[]) { res = 0; res += 7 * v[0]; res += 4 * v[1]; res += 6 * v[2]; return res; } size=3 u[]={7,4,6} dotprod(size,u[],v[]) dotprod_size_u(v[])
T1 T2[ 7 , 0 ] T2[ 4 , 1 ] T2[ 6 , 2 ] T3
Stages: S D dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; }
Dynamo '00 6
buf
Dedicated Run-Time Specializer
dotprod_spec(size,u[]) { buf = alloc(); copy_temp(buf,T1); for(i = 0; i < size; i++) { copy_temp(buf,T2); fill_hole(buf,H1,u[i]); fill_hole(buf,H2,i); } copy_temp(buf,T3); return buf; } T1 T2[ u[0] , 0 ] T3 T2[ u[1] , 1 ] T2[ u[2] , 2 ]
Stages: S D Code generation instructions:
H1 H2 T1 T3 T2
dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; } dotprod(size,u[],v[]) dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; }
Dynamo '00 7
Tentative Balance-Sheet for Tempo (1994 – 1999)
Pros Automation, safety Non-intrusiveness Accurate analyses Q Predictability Low break-even point Easy engineering
AST, compiler re-use
Realistic applications Framework for CT/RT Cons Complex declarations Slicing & re-plugging Fixed precision A posteriori control Code less optimized Limitations
BT precision, optimisation
Prototype
Dynamo '00 8
Precision of the Analyses
[PEPM’97, SAS’97, TCS’00]
Analyses Alias Binding time Interprocedural Flow-sensitive Context-sensitive
- n-going work
Return-sensitive
N.A.
Use-sensitive
N.A.
Field-sensitive
per struct type per struct type (or instance) (or instance)
Dynamo '00 9
Challenges?
Detecting specialization opportunities:
Existing code already hand-optimized Little hope
Dynamo '00 10
Challenges
Architecturing software for specialization
Development methodology More quantitative prediction
Declaring specialization
More automation: no slicing and plugging (guards) Less inference, more checking: downgrade Tempo
Make the technology usable by humans
Dynamo '00 11
Extra slides
Dynamo '00 12
Making Templates
Stages: S D
H1 H2 T1 T3 T2
dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; } dotprod(size,u[],v[]) dotprod(size,u[],v[]) { res = 0; for(i = 0; i < size; i++) { res += u[i] * v[i]; } return res; } dotprod(v[]) { res = 0; res += * v[ ]; return res; }
- Re-use existing compiler
- Symbol table
- Original control flow
- Prevent inter-template code motion
/* T1_start: */ T1_end: T2_start: &h1 &h2 T2_end: T3_start: /* T3_end: */ /* T1_start: */ T1_end: while( dummy ){ T2_start: &h1 &h2 T2_end: } T3_start: /* T3_end: */
- Re-use existing compiler
- Symbol table
- Original control flow
- Re-use existing compiler
- Re-use existing compiler
- Symbol table
Dynamo '00 13
Generating The Run-Time Specializer
specialization actions Templates (.c) Templates (.o) Templates description Template offsets Code generator (.c) Code generator (.o) Dedicated run-time specializer (.o)
Start & end template marks: labels Holes: ptr to global variables ld gcc gcc tcc
- bjdump
bfd Symbol table Inter-template jumps Templates Holes + peep-hole optimisations + inlining (register usage)
Dynamo '00 14
Run-Time Specialization: Implementation
Compilers: gcc, lcc Machines: Sparc, Pentium Main run-time cost: copying instructions Little inter-template optimizations Run-time inlining
Dynamo '00 15
Run-Time Specialization: Experimental Results
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Romberg integration Cubic spline Chebyshev approximation Dithering FFT
Original RT-specialized CT-specialized
Applications Time (normalized)
CT-specialized compiled with
- ptimizations
⇒ “optimal”