Introduction to Syntax Analysis Sebastian Hack - - PowerPoint PPT Presentation

introduction to syntax analysis
SMART_READER_LITE
LIVE PREVIEW

Introduction to Syntax Analysis Sebastian Hack - - PowerPoint PPT Presentation

Introduction to Syntax Analysis Sebastian Hack http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University 1 Syntax Analysis in the Compiler Structure Text Lexer Tokens Parser AST 2 Abstract Syntax vs.


slide-1
SLIDE 1

Introduction to Syntax Analysis

Sebastian Hack

http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University 1

slide-2
SLIDE 2

Syntax Analysis in the Compiler Structure

Text Lexer Tokens Parser AST

2

slide-3
SLIDE 3

Abstract Syntax vs. Concrete Syntax

Syntax is typically defined using context-free grammars

Abstract syntax describes the structure of a program: s → While(e, s) | If(e, s, s) | ExprStmt(e) e → Const[v] | Id[n] | Neg(e) | Plus(e, e) | Minus(e, e) . . . Concrete syntax describes how programs “look” like as text: s → while (e) s | if (e) s else s | e; e → NUM | ID |

  • e

| e + e | e - e | (e) . . .

3

slide-4
SLIDE 4

Lexing

  • The terminals of the concrete syntax are so-called tokens that

are produced by a lexer from the characters of the program text

  • A token consists of
  • An ID that characterizes its type

(identifier, number, semicolon, etc.)

  • Source code coordinates (for error reporting)
  • The corresponding program text (if necessary)
  • Structure of tokens typically described by regular expressions
  • Theory doesn’t require lexing (context-free languages contain

regular languages) but lexing makes the specification of the concrete syntax and the parser simpler

4

slide-5
SLIDE 5

Lexing: Example

Program Text

q = 0; r = x; while (y <= r) { r = r - y; q = q + 1; }

Tokens (coordinates omitted)

ID("q") ASSIGN INT CONST("0") SEMI ID("r") ASSIGN ID("x") SEMI WHILE LPAREN VAR("y") LE VAR("r") RPAREN LBRACE ID("r") ASSIGN ID("r") MINUS ID("y") SEMI ID("q") ASSIGN ID("q") PLUS INT CONST("1") SEMI RBRACE

5

slide-6
SLIDE 6

Parsing

  • The parser analyses the token stream and
  • either constructs the AST
  • or produces error messages on syntax errors
  • Parsing requires an unambiguous grammar:

Every syntactically correct input program has exactly one derivation

  • Straight-forward grammars for common languages are

ambiguous, common issues:

  • Precedence and associativity of operators
  • Dangling else
  • We’ll discuss different solutions to this problem in the parsing

session

6

slide-7
SLIDE 7

Parsing Example

Tokens

ID("q") ASSIGN INT CONST("0") SEMI ID("r") ASSIGN ID("x") SEMI WHILE LPAREN VAR("y") LE VAR("r") RPAREN LBRACE ID("r") ASSIGN ID("r") MINUS ID("y") SEMI ID("q") ASSIGN ID("q") PLUS INT CONST("1") SEMI RBRACE

Abstract Syntax Tree

Seq Assign Var[q] Cnst[0] Seq Assign Var[r] Var[x] While Cmp[Le] Var[y] Var[r] Seq Assign Var[r] Minus Var[r] Var[y] Assign Var[q] Plus Var[q] Const[1]

7