compiler construction
play

Compiler Construction Lecture 2: Compiler Structure and Lexical - PowerPoint PPT Presentation

Compiler Construction Lecture 2: Compiler Structure and Lexical Analysis 2020-01-10 Michael Engel Includes material by Jan Christian Meyer .org Theoretical and practical exercises TA: Lahiru Rasnayake Six problem sets, one every


  1. Compiler Construction Lecture 2: Compiler Structure and Lexical Analysis 2020-01-10 Michael Engel Includes material by Jan Christian Meyer

  2. .org Theoretical and practical exercises • TA: Lahiru Rasnayake • Six problem sets, one every two weeks • Theoretical questions on scanning, parsing, optimization… • Practical: build parts of your own small compiler (in C) • Get your own software project running 
 • Solutions need to be handed in on time • Rather, an empty solution than a plagiarized one • Only the final two will be graded • 20% of the final grade (80% exam) • More details next week Compiler Construction 02: Compiler Structure, Scanning � 2

  3. Overview • Overview: definition and tasks of a compiler • Structure and stages of a typical compiler • Deterministic finite automata (DFA) • Lexical analysis (scanning) Compiler Construction 02: Compiler Structure, Scanning � 3

  4. Compilers are everywhere • Original idea: enable programming of computers in higher- level abstractions than machine language – Zuse's Plankalkül (1940s), FORTAN, LISP, A0 (1950s) • Today: – Many different source languages and target platforms • Additional uses of compilers: – Static analysis and verification – Hardware synthesis – Source-to-source transformations – Just in time (JIT) compilation Compiler Construction 02: Compiler Structure, Scanning � 4

  5. What does a compiler do? • Compiler: 
 “Tool that translates software written in one language into another language” • must understand both the form, or syntax , and content, or meaning ( semantics ), of the input language • and understand the rules that govern syntax and mean- ing in the output language • needs a scheme for mapping content from the source language to the target language • Requirements: • must preserve the meaning of the program being compiled • must improve the input program in some discernible way 
 Compiler Construction 02: Compiler Structure, Scanning � 5

  6. The compilation process black box int factorial(int n) { int fact = 1; while (n--) fact = fact * n; return n; } . . . 0xE59F1010 ? 0xE59F0008 0xE0815000 0xE59F5008 . . . Compiler Construction 02: Compiler Structure, Scanning � 6

  7. Compilation process in detail source code in 
 machine (“object”) high-level language (.c) code (.o) preprocessor linker libraries preprocessed code executable code loader compiler assembler code (.s) debugger assembler Compiler Construction 02: Compiler Structure, Scanning � 7

  8. Structure of a compiler (1) compiler Source code Target program Frontend Backend “understand both the form, “understand the rules that or syntax , and content, or govern syntax and mean- meaning ( semantics ), of ing in the output language” the input language ” “scheme for mapping content from the source language to the target language” Compiler Construction 02: Compiler Structure, Scanning � 8

  9. Structure of a compiler (2) compiler Source code Target program IR IR Backend Optimizer Frontend “understand both the form, “understand the rules that or syntax , and content, or govern syntax and mean- meaning ( semantics ), of ing in the output language” the input language ” “scheme for mapping “must improve the input content from the source program in some language to the target discernible way” language” Compiler Construction 02: Compiler Structure, Scanning � 9

  10. Intermediate representation (IR) • Early compilers directly 
 Java Java Sparc Sparc generated machine code ML ML MIPS MIPS IR Pascal Pascal • n source languages, m targets: Pentium Pentium C C n x m compilers required! Itanium Itanium C++ C++ • Idea: use a common description 
 format: “ Intermediate Representation ” (IR) – Transform source to IR ( front end ) and IR to target code ( back end ) : 
 only n + m compilers required now • Additional advantages of using intermediate representations: – Easy to change source or target language – Easier optimizations: developed only for the intermediate representation – Intermediate representation can be directly interpreted Compiler Construction 02: Compiler Structure, Scanning � 10

  11. 
 Stages of a compiler (1) Source code character stream Code Lexical Syntax Semantic Code generation analysis analysis analysis optimization token sequence Lexical analysis (scanning): – Split source code into lexical units – Recognize tokens (using regular expressions/automata) machine-level program – Token: character sequence relevant to source language grammar 
 x = y + 42 id(x) op(=) id(y) op(+) number(42) character stream token sequence Compiler Construction 02: Compiler Structure, Scanning � 11

  12. Stages of a compiler (2) Source code Lexical Semantic Syntax Code Code analysis analysis analysis optimization generation token sequence syntax tree Syntax analysis (parsing) – Uses grammar of the source language – Decides if input token sequence can be 
 op(=) machine-level program derived from the grammar 
 id(x) op(+) id(y) number(42) Compiler Construction 02: Compiler Structure, Scanning � 12

  13. Stages of a compiler (3) Source code Syntax Semantic Code Lexical Code analysis analysis generation analysis optimization syntax tree IR Semantic analysis – Name analysis (check def. & scope of symbols) machine-level program – Type analysis (check correct type of expressions) – Creation of symbol tables (map identifiers to their types and positions in the source code) Compiler Construction 02: Compiler Structure, Scanning � 13

  14. Stages of a compiler (5) Source code Syntax Semantic Lexical Code Code analysis analysis analysis optimization generation IR IR Code optimization – Analyzes & applies patterns of redundancy machine-level program – e.g., store of a variable followed by a load of it – Often, different stages/levels of optimization with different intermediate representations are applied Compiler Construction 02: Compiler Structure, Scanning � 14

  15. Stages of a compiler (4) Source code Syntax Semantic Code Lexical Code analysis analysis optimization analysis generation IR machine code Code generation – Determines and outputs equivalent machine instructions 
 for components of the IR (instruction selection) machine-level program – Determines correct instruction order with respect to pipeline constraints, 
 exploitation of instruction-level parallelism (instruction scheduling) – Assigns variables to registers (register allocation) and memory locations Compiler Construction 02: Compiler Structure, Scanning � 15

  16. 
 
 Lexical analysis (scanning) Lexical analysis • The compiler input is simply a stream (sequence) of bytes: 
 72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, ... 
 • By convention, these are mapped to letters, digits, etc.: 
 ASCII ‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘ ‘, ‘w’,’o’,’r’,’l’,’d’, ... encoding • Other mappings (encodings) exist • e.g. Unicode UTF-8, EBCDIC • On this level, the input program is just a lot of bytes without any structure Compiler Construction 02: Compiler Structure, Scanning � 16

  17. 
 
 Lexical analysis (scanning) Lexical analysis • Naive approach to scanning: 
 Read letters one by one, e.g., for a key word “while”: 
 w (119), h (104), i (105), l (108), e (10) • Writing a compiler that has to detect this pattern every time the programmer wants to start a loop is inconvenient: • A programmer might choose to call a variable 'whilf': 
 w (119), h (104), i (105), l (108), (looking good so far…) 
 f (10) (oh no, start from scratch, that’s not a loop) Compiler Construction 02: Compiler Structure, Scanning � 17

  18. 
 
 Identifying syntactical units Lexical analysis • Better approach: 
 Group letters into meaningful units and operate on those: 
 ‘i’, ‘f’, ‘(‘, ‘w’,’h’, ‘i’, ‘l’, ‘f’, ‘=’, ‘=’, ‘2’, ‘)’, ‘{‘, ‘x’, ‘=’, ‘5’, ‘;’, ‘}’ 
 if ( whilf == 2 ) { x = 5; } 
 • Here, we use color coding to identify the various units: keywords and punctuation 
 delimiters of groups 
 variables 
 operators 
 numbers Compiler Construction 02: Compiler Structure, Scanning � 18

  19. 
 
 
 Deriving code structure Lexical analysis • What use is the coloring of our units? 
 We've already seen this one: 
 keywords and punctuation 
 if ( whilf == 2 ) { x = 5; } 
 delimiters of groups 
 variables 
 operators 
 How would we color that line? 
 numbers while ( a < 42 ) { a += 2; } 
 Using the same coloring roles, we get: 
 while ( a < 42 ) { a += 2; } • These two statements have completely different meanings but share the same (syntactic) structure (here: sequence of colors) • We’ll talk about structure later • Today, we will look at lexical analysis 
 Compiler Construction 02: Compiler Structure, Scanning � 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend