Design Decisions for a Source-2-Source Compiler Roger Ferrer, Sara - - PowerPoint PPT Presentation

design decisions for a source 2 source compiler
SMART_READER_LITE
LIVE PREVIEW

Design Decisions for a Source-2-Source Compiler Roger Ferrer, Sara - - PowerPoint PPT Presentation

Design Decisions for a Source-2-Source Compiler Roger Ferrer, Sara Royuela, Diego Caballero, Alejandro Duran, Xavier Martorell and Eduard Ayguad Barcelona Supercomputing Center and Universitat Politcnica de Catalunya Cetus Users and


slide-1
SLIDE 1

Roger Ferrer, Sara Royuela, Diego Caballero, Alejandro Duran, Xavier Martorell and Eduard Ayguadé Barcelona Supercomputing Center and Universitat Politècnica de Catalunya

Cetus Users and Compiler Infrastructures Workshop in conjunction with PACT'11

Galveston, TX, USA, October 10th, 2011

Design Decisions for a Source-2-Source Compiler

slide-2
SLIDE 2

Outline

  • Structure of the compiler
  • A little of history
  • Design decisions for
  • Developments
  • Conclusions and future work
slide-3
SLIDE 3

Structure of

  • All phases based on

the same IR

  • Code outlined

– On primary files – on separate files – can be sent back for further parsing

Input source file C/C++/Fortran 95 Frontend Common high level representation Output source file C/C++/Fortran Backend compiler Object file Secondary input source files C/C++/Fortran 95 Secondary

  • utput source file

C/C++/Fortran Backend compiler Secondary object file Embed process Compiler phase Compiler phase Compiler phase

slide-4
SLIDE 4

The story

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Parafrase-2 (Fortran77)

  • Univ. of Illinois

Open-64 Released by SGI (C/C++) mcc (C) NANOS INTONE POP mf95 (Fortran) mcxx (C/C++) Fortran2003 ACOTES SARC ENCORE MONT BLANC penMP 1.0

  • Fortran

OpenMP 1.0

  • C/C++

OpenMP 3.0 (tasking) OpenMP 2.5

  • merging

Fortran & C/C++ OpenMP 3.1 TERAFLUX TEXT OpenMP 2.0

  • Fortran

OpenMP 2.0

  • C/C++
slide-5
SLIDE 5

Outline

  • Structure of the compiler
  • A little of history
  • Design decisions for
  • Developments
  • Conclusions and future work
slide-6
SLIDE 6

design decisions

  • Extended parsing && later disambiguation
  • Write source && subparsing
  • Generic driver && plug-ins
  • Drop-in replacement && driver compatibility
  • Have multi-file support && secondary output files
  • Common representation && FORTRAN/C/C++
slide-7
SLIDE 7

Extended parsing

  • Add new language features

– Types, built-in functions

  • Extend directives/pragmas

– Directive registration

  • Directives are broken into tokens and lists of

symbols

– Meaning is built at a later pass

  • Later disambiguation
slide-8
SLIDE 8

Write source

  • Source datatype

– Embed the code to be generated in your compiler phases

Source TL::Nanox::common_parallel_code(const std::string& outline_name, Source num_threads, ...) { device_provider->do_replacements(data_environ_info, parallel_code, ...); device_provider->create_outline(outline_name, struct_arg_type_name, ...); result << "{" << "unsigned int _nanos_num_threads = " << num_threads << ";" << "nanos_team_t _nanos_team = (nanos_team_t)0;" << "nanos_thread_t _nanos_threads[_nanos_num_threads];" << "nanos_err_t err;" << "err = nanos_create_team(&_nanos_team, (nanos_sched_t)0, &_nanos_num_threads,” << "(nanos_constraint_t*)0, /* reuse_current */ 1, _nanos_threads);" ... }

slide-9
SLIDE 9

Write source

  • “Source” is later parsed

– Incorporating it to the IR

  • Early compiler phase can generate directives

for later phases

  • Subparsing:

multiple parsing starting points in grammar

  • Statement, declaration, function, directive...

Source new_pragma_construct_src; new_pragma_construct_src << "#line " << construct.get_ast().get_line() << " \"" << construct.get_ast().get_file() << "\"\n" << device_line << "#pragma omp task " << clauses << "\n" << ";" ;

slide-10
SLIDE 10

Generic driver

  • Driver is controlled by a configuration file

– Allows conditional execution of compiler phases

  • shared libraries

... # if --instrument is given, activate the internal # compiler variable indicating so {instrument} options = --variable=instr:1 # load the proper compiler plug-in {instrument} compiler_phase = libtlinstr.so # and link against the proper (instrumented) libraries {instrument} linker_options = \

  • L@NANOX_LIBS@/instrumentation -lnanox

...

Compiler plugin

slide-11
SLIDE 11

Drop-in replacement

  • Autoconf, CMake, Makefiles should work out of

the box

  • Offer the flavor of just any other compiler

– Be compatible with the way of invoking GCC

  • Functionalities

– Generate object files, and executable files – Link objects into the executable – Embed GPU or SPU binaries into the main host executable

Driver compatibility

slide-12
SLIDE 12

Have multi-file support

  • Portions of code identified as “accelerator code”

– Are outlined to separate files – Compiled with the native accelerator compiler

Output source file C/C++/Fortran Backend compiler Object file Secondary input source files C/C++/Fortran 95 Secondary

  • utput source file

C/C++/Fortran Backend compiler Secondary object file Embed process Compiler phase Back to frontend

slide-13
SLIDE 13

Common representation

  • Use a common IR among FORTRAN/C/C++

FOR '(' for_init_statement condition_opt ';' expression_opt ')' statement { AST loop_control = ASTMake3(AST_LOOP_CONTROL, $3, $4, $6, $1.token_file, $1.token_line, NULL);

$$ = ASTMake3(AST_FOR_STATEMENT, loop_control, $8, NULL, $1.token_file, $1.token_line, “c++”);

} cxx03.y loop_control : comma_opt do_variable '=' int_expr ',' int_expr comma_int_expr_opt { AST assig = ASTMake2(AST_ASSIGNMENT, $2, $4, ASTFileName($2), ASTLine($2), NULL);

$$ = ASTMake3(AST_LOOP_CONTROL, assig, $6, $7, ASTFileName($2), ASTLine($2), "fortran");

} label_do_stmt : labeldef name_colon_opt TOK_DO label loop_control eos { $$ = ASTMake5Label(AST_FOR_STATEMENT, $1, $5, NULL, NULL, $4, $3.token_file,

$3.token_line, NULL);

} fortran03.y

slide-14
SLIDE 14

Outline

  • Structure of the compiler
  • A little of history
  • Design decisions for
  • Developments
  • Conclusions and future work
slide-15
SLIDE 15

Developments

  • OpenMP 3.0 tasking

– tasks, standard since 2008 – prototypes for taskgroups

  • Not included in the standard
  • Prototyping User Defined Reductions (UDRs)
  • StarSs and OmpSs: input/output/inout extensions

– Data dependences among tasks – Copy in/out to/from accelerator memories

slide-16
SLIDE 16

16

  • Objective

– New set of annotations for data dependence analysis and movement

float a[N]; // want them copied to accelerator float b[N]; // as automatically as possible float c[N]; for (i=0; i<N; i++) { #pragma omp target device(cuda) copy_deps #pragma omp task input (a, b) output (c) { c[i] = a[i] + b[i]; // want it run in the accelerator } }

OmpSs: OpenMP+StarSs

slide-17
SLIDE 17

17

  • BlackScholes annotated

for (i=0; i<array_size; i+=local_work_group_size*vector_width) { int limit = ((i+local_work_group_size)>array_size) ? array_size - i : local_work_group_size; uint * cpflag_f = &cpflag_fptr[i]; float * S0_f = &S0_fptr[i]; float * K_f = &K_fptr[i]; float * r_f = &r_fptr[i]; float * sigma_f = &sigma_fptr[i]; float * T_f = &T_fptr[i]; float * answer_f = &answer_fptr[i]; #pragma omp target device(cuda) copy_deps #pragma omp task shared(cpflag_f,S0_f,K_f,r_f,sigma_f,T_f,answer_f) \ input ( \ [global_work_group_size] cpflag_f, \ [global_work_group_size] S0_f, \ [global_work_group_size] K_f, \ [global_work_group_size] r_f, \ [global_work_group_size] sigma_f, \ [global_work_group_size] T_f) \

  • utput ([global_work_group_size] answer_f)

{ // kernel code } }

6 inputs 1 output

256 256

OmpSs: OpenMP+StarSs

slide-18
SLIDE 18

18

OmpSs: OpenMP+StarSs

  • BlackScholes evaluation

– 1.4x in SMP – 2x performance increase in Cell/B.E. vs. OpenCL – Equivalent in GTX 285, no scaling due to inputs

Cell processor Intel Xeon server Nvidia GPU GTX 285

1 2 4 8 12 16 20 24 5 10 15 20 25 30 35 OpenCL OmpSs Number of CPUs

Speedup

1 2 4 8 12 16 2 4 6 8 10 12 OpenCL, awgc OpenCL, db OmpSs Number of SPUs

Speedup

1 2 2 4 6 8 10 12 14 16 18

OpenCL, AWGC OpenCL, LS OmpSs

Number of GPUs Speedup

slide-19
SLIDE 19
  • Generic error management and

user error handlers

  • Evaluated with the

NAS benchmarks

#pragma omp parallel onerror(OMP_SEVERE_ERROR : OMP_ABORT, \ OMP_MEDIUM_ERROR : my_error_handler, arg) { // parallel region }

Error handling for OpenMP

slide-20
SLIDE 20

User-driven vectorization

  • High-level transformation directive indicating...

– A loop is vectorizable

  • and the variables that should be vectorized

– A function is vectorizable – Still, user needs to take care of proper alignment!

#pragma hlt simd float myfunc(float X) { ... } void main(int args, char * argv[]) { float a [N], b[N], c[N]; ... #pragma hlt simd(a, b, c) for (i=0; i < N; i++) { c[i] = a[i] + b[i] + myfunc(a[i]); } ... } Inspired on the Intel SIMD directive available in ICC

slide-21
SLIDE 21

User-driven vectorization

  • Results comparable to Intel ICC auto and SIMD
slide-22
SLIDE 22

User-driven vectorization

  • Results comparable to Intel ICC SIMD with MKL
slide-23
SLIDE 23

User-driven vectorization

  • Results comparable to Intel OpenCL
slide-24
SLIDE 24

Outline

  • Structure of the compiler
  • A little of history
  • Design decisions for
  • Developments
  • Conclusions and future work
slide-25
SLIDE 25

Conclusions

  • Shown the Mercurium compiler infrastructure

– C/C++ and Fortran – Flexible to be adapted to lots of projects – Useful and productive for directive-based program transformations – Mostly compatible with existing compilers – Support for heterogeneity and local memories

slide-26
SLIDE 26
  • Fortran support

– Recognize directives and generate code

  • Analysis phases

– Currently, life analysis for symbols

  • Code inlining, interprocedural analysis
  • Prototyping of new OpenMP features

Future work

slide-27
SLIDE 27

27

Acknowledgments

  • Parallel programming group @BSC
  • Encore, TEXT, TERAFLUX, MONTBLANC

– European Commission

available at Barcelona Supercomputing Center http://pm.bsc.es