a.k.a. Cling September 29, 2019 Computer Architecture and Technology Area, University Carlos III of Madrid IMPROVING THE ROOT INTERPRETER Javier López-Gómez <jalopezg@inf.uc3m.es>
A CS PhD. student @ARCOS UC3M (Computer Architecture and Technology Area, University Carlos III of Madrid). Focus on low-level stufg, e.g. Linux kernel, embedded software, compilers, electronics… 2017—2018: (partial) implementation of P0542R5 (C++ contracts) in Clang. jalopezg-uc3m 0xd3c4fb4d 1/19 Who am I? <jalopezg@inf.uc3m.es>
1 Introduction 2 Adding support for redefjnitions 3 Other stufg I worked on 4 Conclusion 2/19 Agenda
Introduction
ALICE: What is Clang? BOB: A C++ compiler. ALICE: I know what a compiler is! BOB: Sure? At the core of ROOT: Cling (a C++ Clang-based interpreter) 3/19 Introduction
ALICE: I know what a compiler is! BOB: Sure? At the core of ROOT: Cling (a C++ Clang-based interpreter) 3/19 Introduction ALICE: What is Clang? BOB: A C++ compiler.
3/19 At the core of ROOT: Cling (a C++ Clang-based interpreter) Introduction ALICE: What is Clang? BOB: A C++ compiler. ALICE: I know what a compiler is! BOB: Sure?
4/19 Parser Object code … (CodeGen) IR generation AST (Sema) Semantic analysis (Parse) (Lex) Translates valid input (source) into output Scanner Source code analysis, etc. syntactic analysis (parser), semantic Several stages: lexical analysis (scanner), in a language. assembly. A compiler accepts valid strings (object code), e.g. C++ into Intel x86 I know what a compiler is! Sure? (1/6)
Languages are described in terms of grammars. Formal grammars described using the Backus-Naur Form (BNF), e.g. 5/19 I know what a compiler is! Sure? (2/6) equation ::= expr '=' expr ; expr ::= '(' expr ')' | expr '+' expr | expr '^' expr | NUMBER | IDENTIFIER ; Then, given ‘ a 2 = ( b 2 + c 2 ) ’ we may decompose it as…
6/19 We call this decomposition Abstract Syntax Tree (AST). I know what a compiler is! Sure? (3/6) equation expr expr '=' expr expr expr ^ ( ) expr expr 'a' 2 '+' expr expr expr expr ^ ^ 'b' 2 'c' 2
Grammars of programming languages are much more complex than this. Take another naïve example: 7/19 I know what a compiler is! Sure? (4/6) declaration-list ::= declaration-list declaration | declaration ; declaration ::= type IDENTIFIER ';' | type IDENTIFIER '=' expr ';' ; type ::= 'int' | 'double' ; expr ::= '(' expr ')' | ... Then, the AST for the input ‘ int j = 3; int i = j; ’ is…
declaration-list |- declaration `j' type=int | `- initial value 3 `- declaration `i' type=int `- initial value j At some point, we might also want to do semantic analysis, e.g. 8/19 I know what a compiler is! Sure? (5/6) “Is j already defjned before the fjrst declaration?” or “May i (int) be initialized to the value of j (int)?”
9/19 |-ImplicitCastExpr 0xf318c0 <col:27> 'int' <LValueToRValue> If OK, proceed to code generation. Most diagnostics are emitted as part of the semantic analysis. `-IntegerLiteral 0xf318a0 <col:31> 'int' 1 | | `-DeclRefExpr 0xf31880 <col:27> 'int' lvalue `-BinaryOperator 0xf318d8 <col:27, col:31> 'int' '+' `-ReturnStmt 0xf318f8 <col:20, col:31> `-CompoundStmt 0xf31908 <col:18, col:34> | `-IntegerLiteral 0xf31720 <col:15> 'int' 0 |-ParmVarDecl 0xf316c0 <col:7, col:15> col:11 used x 'int' cinit `-FunctionDecl 0xf317a0 </tmp/in.cpp:1:1, col:34> col:5 f 'int (int)' I know what a compiler is! Sure? (6/6) A real one (Clang) for the C++ input ‘ int f(int x = 0) { return x + 1; } ’: ParmVar 0xf316c0 'x' 'int'
Turning C++ into an interpreted language is calling for problems :-) ODR (One-Defjnition Rule): no more than one defjnition per translation unit, but may have more than one (compatible) redeclaration, e.g. int f(int x); int f(int x) { return 0; } // OK, no previous definition 10/19 The main problem we had (1/2)
Therefore, the following is ill-formed according to ISO C++… int foo = 0; double foo; // Can't redefine 'foo' in this scope int f(int x) { return 0; } int f(int x) { return x + 1; } // Ditto …but our Jupyter-notebook users would certainly appreciate this! 11/19 The main problem we had (2/2)
Adding support for redefjnitions
Cling is based on (1) parsing input, (2) applying transformations (ASTTransformers), and (3) emitting code, so we can certainly Some discussion around this at SHORT STORY: Chandler Carruth/Axel Naumann proposed to transform user input (nesting declarations into namespaces), so that C++ lookup rules resolve an reference to the most recent defjnition. do that! 12/19 Adding support for redefjnitions (1/5) https://github.com/root-project/cling/issue/259 .
Some discussion around this at SHORT STORY: Chandler Carruth/Axel Naumann proposed to transform user input (nesting declarations into namespaces), so that C++ lookup rules resolve an reference to the most recent defjnition. (ASTTransformers), and (3) emitting code, so we can certainly do that! 12/19 Adding support for redefjnitions (1/5) https://github.com/root-project/cling/issue/259 . Cling is based on (1) parsing input, (2) applying transformations
13/19 Issue #259 proposed two difgerent approaches: Chaining seems better, but it doesn’t work (at least in that … } // line 2 double i = 1.0; using namespace __cling_1; namespace __cling_2 { } unsigned int i = 0; // line 1 namespace __cling_1 { Listing 2: Chained namespaces } } namespace { … } // line 2 double i = 1.0; namespace { unsigned int i = 0; // line 1 namespace { Listing 1: Nested namespaces Adding support for redefjnitions (2/5) form) —unqualifjed reference to i is ambiguous!
The current implementation transforms top-level declarations and nests them onto inline namespaces (so that their names are available in the TU), i.e. inline namespace __cling_N50 { int foo = 0; /* input line 1 */ } inline namespace __cling_N51 { double foo; /* input line 2 */ } … This allows “redefjnition” from the user’s perspective, although they are difgerent declarations, i.e. the enclosing scope (TU) lookup table to hide any former defjnition. 14/19 Adding support for redefjnitions (3/5) __cling_N50::foo and __cling_N51::foo . Referring to unqualifjed foo IS still ambiguous. We patch
LOTS OF PITFALLS: function overload, class/function templates, etc. —But you got the idea! Support for all of this included in PR #4214. Merged into master! Beware of TCling! We also had to patch it to invalidate any cached information about former defjnitions of the same symbol (PR #4446). 15/19 Adding support for redefjnitions (4/5)
16/19 DEMO Adding support for redefjnitions (5/5)
Other stufg I worked on
Cling was able to unload (revert) transactions. Therefore, it can that a symbol is already defjned… …but it didn’t work for some macro fjles. BUG: it tried to unload template instantiations whose point of instantiation was in the PCH. Also, we avoid the unload+load cycle if the fjle hasn’t changed (PR #??). 17/19 Improving unloading, i.e. ‘.L’ and ‘.x’ parse a fjle more than once with `.x' without complaining Fixed in PR #4447.
During these 3 months, I also found time to fjx bugs in ROOT: execution (Segmentation violation). empty. 18/19 Bug fjxing ROOT-9934: typing `.stats decl' at the prompt aborted ROOT-10285: DEL key causes the ROOT prompt to exit if line is
Conclusion
Cling now supports redefjnitions —possibly of a difgerent type— (PRs #4214 and #4446). root [0] int f(int x) { return 0; } root [1] int f(int x) { return (x << 1); } root [2] f(4) (int) 8 #4214 already merged into master. Some ROOT bugs fjxed (PRs #4213 and #4420) —also, merged into master. 19/19 Conclusion Improved `.x' (PRs #4447 and #??).
19/19 Thanks! Thank you! Thank you!
Recommend
More recommend