warm up project
play

Warm-up project Aslan Askarov aslan@cs.au.dk Revised from slides by - PowerPoint PPT Presentation

Compilation 2014 Warm-up project Aslan Askarov aslan@cs.au.dk Revised from slides by E. Ernst Straight-line Programming Language Toy programming language: no branching, no loops Skip lexing and parsing issues Focus on the


  1. Compilation 2014 Warm-up project Aslan Askarov aslan@cs.au.dk Revised from slides by E. Ernst

  2. Straight-line Programming Language • Toy programming language: no branching, no loops • Skip lexing and parsing issues • Focus on the “meaning” – interpretation • Syntax Stm → Stm; Stm (CompoundStm) ExpList → Exp , ExpList (PairExpList) Stm → id := Exp ExpList → Exp (AssignStm) (LastExpList) Stm → print ( ExpList ) Binop → + (PrintStm) (Plus) Exp → id Binop → – (IdExp) (Minus) Exp → num Binop → × (NumExp) (Times) Exp → Exp BinOp Exp Binop → / (OpExp) (Div) Exp → ( Stm , Exp ) (EseqExp)

  3. Straight-line program • Source: CompoundStm � a := 5 + 3; AssignStm CompoundStm b := (print (a, a - 1),10 * a); � OpExp AssignStm PrintStm a print (b) � NumExp BinOp NumExp EseqExp LastExpList b � PrintStm OpExp IdExp • Corresponding syntax tree: 5 Plus 3 PairExpList NumExp BinOp IdExp b IdExp LastExpList Times 10 a OpExp a IdExp BinOp NumExp Minus a 1

  4. SLP syntax representation datatype • SML declaration (CompoundStm) Stm → Stm; Stm type id = string (AssignStm) Stm → id := Exp datatype binop Stm → print ( ExpList ) (PrintStm) = Plus | Minus | Times | Div Exp → id (IdExp) datatype stm Exp → num (NumExp) = CompoundStm of stm * stm Exp → Exp BinOp Exp (OpExp) | AssignStm of id * exp Exp → ( Stm , Exp ) (EseqExp) | PrintStm of exp list (PairExpList) ExpList → Exp , ExpList and exp ExpList → Exp (LastExpList) = IdExp of id Binop → + (Plus) | NumExp of int Binop → – (Minus) Binop → × | OpExp of exp * binop * exp (Times) Binop → / (Div) | EseqExp of stm * exp

  5. SLP syntax representation • Source program CompoundStm a := 5 + 3; � AssignStm CompoundStm b := (print (a, a - 1),10 * a); � print (b) OpExp AssignStm PrintStm a � NumExp BinOp NumExp EseqExp LastExpList b • SML value: PrintStm OpExp IdExp 5 Plus 3 val prog = CompoundStm ( PairExpList NumExp BinOp IdExp b AssignStm (“a", OpExp ( NumExp 5, IdExp LastExpList Times 10 a Plus, NumExp 3)), OpExp a CompoundStm ( IdExp BinOp NumExp AssignStm ("b", EseqExp ( PrintStm [IdExp "a", Minus a 1 OpExp (…)], OpExp (NumExp 10, …))), PrintStm [IdExp "b"]))

  6. Project assignment • Follow descriptions p10-12 in MCIML • “Modularity principles” p9-10: discussed on Friday, may be ignored at first

  7. Lexical analysis

  8. Lexical analysis High-level source code Lexing Parsing Elaboration Low-level target … code

  9. Lexical analysis First phase in the compilation Input: stream of characters i f ( x > 0 ) \n \t t h e n 1 \n \t e l s e 0 IF LPAREN ID (“x”) GE INT (0) RPAREN THEN INT (1) ELSE INT (0) Output: stream of tokens in our language Discards comments, whitespace, newline, tab characters, preprocessor directives

  10. Tokens Type Examples ID foo n14 a’ my-fun INT 73 0 070 REAL 0.0 .5 10. IF if COMMA , LPAREN ( ASGMT :=

  11. Non-tokens Type Examples comments /* dead code */ // comment (* nest (*ed*) *) preprocessor directives #define N 10 #include <stdio.h> whitespace

  12. Token data structure • Many tokens need no associated data, e.g.: 
 IF , COMMA, LPAREN, RPAREN, ASGMT � • Some tokens carry an associated string: 
 ID (“my-fun”) � • Some tokens carry associated data of other types: 
 INT (73), INT (1), FLOAT (IEEE754, 1001111100…) � • Tokens may include useful additional information: 
 start/end pos in input file (line number + column, or charpos)

  13. 
 Q/A • Consider source program 
 var δ := 0.0 � • Language: case sensitive, ASCII � • How to report error of using δ ? FileName:Line.Col: Illegal character δ

  14. Regular expressions • We can use regular expressions to specify programming language tokens • Regular expressions: • Expected to be well-known • Syntax: • symbol a • choice x | y • concat x y • empty ε • repeat x*

  15. Regular expressions used for scanning • Examples • if (IF); • [a-z][a-z0-9]* (ID); • [0-9]* (NUM); • ([0-9]+”.”[0-9]*) | ([0-9]* ”.” [0-9]+) (REAL); • (”--” [a-z]*”\n”) | (” ”|”\t”) (continue()); • . (error (); continue());

  16. Resolving ambiguities • Rule: when a string can match multiple tokens, the longest matching token wins • if (IF); � i f x > 0 • [a-z][a-z0-9]* (ID); � ID (“ifx”) � • We also need to specify priorities if we match several tokens of the same length. • Usual rule: earliest declaration wins i f ID (“if”) IF

  17. Lexical analysis Specification: Tokens as regular exps +longest-matching rule +priorities Formalism: NFA DFA Implementation: Simulate NFA Simulate DFA linear complexity Program that translates raw text Output: into stream of tokens

  18. Total NFA for ID,IF,NUM,REAL a-e,g-z,0-9 0-9,a-z ID error IF REAL 0-9,a-z f 0-9 0-9 . ID 4 2 3 5 6 a-h,j-z . i NUM REAL 0-9 0-9 7 8 1 blank etc. - 0-9 whitespace other blank - \n etc. 9 12 13 11 10 error error a-z

  19. ML-Lex • Lexer generator, “built-in” part of SML/NJ • Accepts lexical specification, produces a scanner • Example specification (* SML declarations *) type lexresult = Tokens.token fun eof() = Tokens.EOF(0,0) %% (* Lex definitions *) digits=[0-9]+ %% (* Regular Expressions and Actions *) if => (Tokens.IF(yypos,yypos+2)); [a-z][a-z0-9]* => (Tokens.ID(yytext,yypos,yypos + size yytext)); {digits} => (Tokens.NUM( Int.fromString yytext, yypos, yypos + size yytext); ({digits}”.”[0-9]*)|([0-9]*”.”{digits}) => (Tokens.REAL( Real.fromString yytext, yypos, yypos + size yytext)); (“--”[a-z]*”\n”)|(“ “|”\n”|”\t”)+ => (continue()); • => ( ErrorMsg.error yypos “Illegal character”; continue());

  20. Lexer states • Helpful when handling di ff erent “kinds” of tokens • For ex.: use state • INITIAL in general lexing (automatic) • STRING when scanning the contents of a string • COMMENT when scanning a comment • Point: keep di ff erent concerns apart – simpler! • Syntax: ... (* Regular Expressions and Actions *) <INITIAL>if => (Tokens.IF(yypos,yypos+2)); <INITIAL>[a-z][a-z0-9]* => (Tokens.ID(yytext,yypos,yypos + size yytext)); ... <INITIAL>”\”” => (YYBEGIN STRING; continue()); ... <STRING>. => (continue()); ...

  21. Summary • Warm-up project: Program in SML! • Straight-line programming language, no lexing/parsing involved • Express programs: use abstract syntax tree datatype • Project specified on website, essentially as in the book • Lexical analysis • Avoid complexity in grammar. Use lexer • Based on regular expressions. Implementation via NFA/DFA • Theory assumed known • Tools: ML-Lex • Scanner generator, outputs SML code from spec • Note lexer states

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend