aslan askarov aslan cs au dk
play

Aslan Askarov aslan@cs.au.dk acknowledgments: E.Ernst, - PowerPoint PPT Presentation

____ _ _ _ _ / ___|___ _ __ ___ _ __ (_) | __ _| |_(_) ___ _ __ | | / _ \| '_ ` _ \| '_ \| | |/ _` | __| |/ _ \| '_ \ | |__| (_) | | | | | | |_) | | | (_| | |_| | (_) | | | | \____\___/|_| |_| |_|


  1. ____ _ _ _ _ / ___|___ _ __ ___ _ __ (_) | __ _| |_(_) ___ _ __ | | / _ \| '_ ` _ \| '_ \| | |/ _` | __| |/ _ \| '_ \ | |__| (_) | | | | | | |_) | | | (_| | |_| | (_) | | | | \____\___/|_| |_| |_| .__/|_|_|\__,_|\__|_|\___/|_| |_| |_| ____ ___ _ __ |___ \ / _ \/ |/ /_ __) | | | | | '_ \ / __/| |_| | | (_) | |_____|\___/|_|\___/ Aslan Askarov aslan@cs.au.dk acknowledgments: E.Ernst, M.I.Schwartzbach, J. Midtgaard, G. Morrisett, S. Zdancewic

  2. What is a compiler?

  3. 
 What is a compiler? A program that translates … A) human-readable program into machine code B) programs in a source language into programs in a target language C) programs in a source language into programs in a target language, while preserving semantics D) a source language into a target language while preserving the semantics

  4. What is a compiler? • Translator from one programming language (source) into another (target) • preserves the semantics • the compiler also implicitly defines the semantics, though it’s harder to reason about programs with compiler-defined semantics • Typically: • the source language is high-level • the target language is low-level • Not always: • Java compiler: Java to interpretable bytecode • Java JIT: bytecode to executable

  5. Why use compilers? • Economy • takes care of hundreds of low-level micro decisions that would otherwise need to be handled by programmers • Performance • best compilers generate better code than most programmers • e.g.: automatic parallelization on multi-core • Safety & Security

  6. First compilers 1952: Grace Hopper introduces the term “Compiler” for A-0 programming language

  7. 1957: Fortran – first real compiler “We went on to raise the question “…can a machine translate a su ffi ciently rich mathematical language into a su ffi ciently economical program at a su ffi ciently low cost to make the whole a ff air feasible?” — J. Backus The History of Fortran I, II, and III (1978)

  8. 1957: Fortran – first real compiler • Lead by John Backus at IBM • Motivated by the economics of programming • Had to overcome deep skepticism • Focused on e ffi ciency of the generated code • Pioneered many concepts and techniques • Revolutionized computer programming

  9. How good are today’s compilers? … Ltmp9: #include <stdio.h> .cfi_def_cfa_register %rbp #include <stdlib.h> leaq L_.str(%rip), %rdi movl $3628800, %esi ## imm = 0x375F00 long factorial(long X) { xorl %eax, %eax if (X == 0) return 1; callq _printf return X*factorial(X-1); xorl %eax, %eax } popq %rbp ret int main(int argc, char **argv) { .cfi_endproc printf("%ld\n", factorial(10)); return 0; .section __TEXT,__cstring,cstring_literals } L_.str: ## @.str .asciz "%ld\n" Source C program Compiled assembly $ clang factorial.c -S -O3 -o-

  10. Basic phases of a compiler Compiler phases suggest modular design High-level source 1 phase = 1 module code Lowering Lexing/Parsing Elaboration Code generation Optimization Low-level target code

  11. Front end • Lexing & Parsing • From strings to data structures • First two steps in processing from raw data to structured information • Elegant application of CS theory • Regular expressions (finite state automata) • Context-free grammars (push-down automata) • Established & streamlined tool support Abstract Syntax String/Files Tokens Tree Lexing Parsing

  12. Example: function in Tiger language function printint(i: int) = let function f(i: int) = if i>0 then ( f(i/10); print(chr(i-i/10*10+ord("0"))) ) in if i<0 then (print("-"); f(-i)) else if i>0 then f(i) else print("0") end

  13. Stream of tokens symbol: "/" keyword: "if" keyword: "function" intliteral: "10" identifier: "i" identifier: "printint" symbol: "*" symbol: ">" symbol: "(" intliteral: "10" intliteral: "0" identifier: "i" symbol: "+" keyword: "then" symbol: ":" identifier: "ord" symbol: "(" identifier: "int" symbol: "(" identifier: "f" symbol: ")" symbol: "\"" symbol: "(" symbol: "=" stringliteral: "0" identifier: "i" keyword: "let" symbol: "\"" symbol: "/" keyword: "function" symbol: ")" intliteral: "10" identifier: "f" symbol: ")" symbol: ")" symbol: "(" symbol: ")" symbol: ";" identifier: "i" symbol: ")" identifier: "print" symbol: ":" keyword: "in" symbol: "(" identifier: "int" identifier: "chr" symbol: ")" ... symbol: "(" symbol: "=" identifier: "i" keyword: "end" symbol: "-" identifier: "i"

  14. Abstract syntax SeqExp[ FunctionDec[ IfExp( (printint,[ OpExp(LtOp, (i,true,int)], CallExp(print,[ VarExp( NONE, CallExp(chr,[ SimpleVar(i)), LetExp([ OpExp(PlusOp, IntExp(0)), FunctionDec[ OpExp(MinusOp, SeqExp[ (f,[ VarExp( CallExp(print,[ (i,true,int)], SimpleVar(i)), StringExp("-")]), NONE, OpExp(TimesOp, CallExp(f,[ IfExp( OpExp(DivideOp, OpExp(MinusOp, OpExp(GtOp, VarExp( IntExp(0), VarExp( SimpleVar(i)), VarExp( SimpleVar(i)), IntExp(10)), SimpleVar(i)))])], IntExp(0)), IntExp(10))), IfExp( SeqExp[ CallExp(ord,[ OpExp(GtOp, CallExp(f,[ StringExp("0")]))])])]))]], VarExp( OpExp(DivideOp, SimpleVar(i)), VarExp( IntExp(0)), SimpleVar(i)), CallExp(f,[ IntExp(10))]), VarExp( SimpleVar(i))]), CallExp(print,[ StringExp("0")])))]))]

  15. Abstract Syntax Tree function printint args type NONE int let i seq function if f args < if type int if int seq NONE var i 0 seq > call call > simplevar print f int i var 0 int var call 0 / call call chr call simplevar simplevar f print f i i var + call - string int simplevar call ord "-" - var 10 i print var string * var “0” string simplevar int simplevar simplevar / “0” i 0 i int i 10 var simplevar int i 10

  16. Elaboration • Resolving scope • Type checking • Resolving variable types, modules, etc • Check that operators and function calls are given the values of the right types • Infer types for sub-expressions • Most errors are reported to the user by the end of this phase Untyped Abstract Typed Abstract Syntax Tree Syntax Tree

  17. Lowering • Translate high-level features into a small number of target-like constructs • while, for - loops are all compiled to code using jumps • embed array-bound checks, etc. Intermediate Typed Abstract Representation Syntax Tree

  18. Intermediate Representation

  19. Optimization • Detect expensive sequences of operations that can be rewritten into less expensive • Ex: • constant folding: 2 + 2 → 4 • lifting invariant computation out of a loop • parallelize a loop Intermediate Intermediate Optimization Representation Representation

  20. Code generation • Translate intermediate representation into target code • Register assignment • Instruction selection • Instruction scheduling • Machine-specific optimizations Intermediate Machine Code Representation

  21. x86 Instructions .text # PROCEDURE tigermain .globl tigermain .func tigermain .type tigermain, @function tigermain: # FRAME tigermain(1 formals, 4 locals) pushl %ebp movl %esp, %ebp subl $20, %esp # SP, FP, calleesaves, argregs have values L16_blocks: movl -4(%ebp), %ebx movl $123, %ebx movl %ebx, -4(%ebp) movl -4(%ebp), %ebx pushl %ebx pushl %ebp call L2_printint jmp L15_block_done L15_block_done: # FP, SP, RV, calleesaves still live leave ret .size tigermain, .-tigermain ...

  22. Binary code Hex contents of .o file: 0000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 0000020 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 0000040 84 04 00 00 00 00 00 00 34 00 00 00 00 00 28 00 0000060 0f 00 0c 00 55 89 e5 83 ec 14 8b 5d fc bb 7b 00 0000100 00 00 89 5d fc 8b 5d fc 53 55 e8 fc ff ff ff eb 0000120 00 c9 c3 55 89 e5 83 ec 40 8b 5d 0c 89 5d fc 8b 0000140 5d fc 83 fb 00 7c 3b eb 00 8b 5d 0c 89 5d f8 8b 0000160 5d f8 83 fb 00 7f 72 eb 00 8b 5d f4 bb 00 00 00 0000200 00 89 5d f4 8b 5d f4 53 8b 5d 08 89 5d f0 8b 5d 0000220 f0 8b 4b 08 89 4d ec 8b 5d ec 53 e8 fc ff ff ff ... 0005140 06 00 00 00 01 16 00 00 10 00 00 00 01 01 00 00 0005160

  23. Bootstrapping compilers (1/5) • We have source lang target lang • a source programming language L • a target machine language M L M • We want a compiler from L to M implemented in M M • so we can compile natively on M– architecture implementation lang • Implementing this directly in M is hard • Idea: introduce auxiliary intermediate (T-diagram) languages for which the task of compilation is more practical

  24. Bootstrapping compilers (2/5) • We define: • L ↓ is a simple subset of L • M ↓ is a naive and ine ffi cient M code L ↓ M ↓ • Step 1: Implement L ↓ to M ↓ compiler in M ↓ : M ↓ • Step 2: Implement L to M L M compiler in L ↓ (can be done in parallel to Step 1): L ↓

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend