Compilation 2016
Lexical Analysis
Aslan Askarov aslan@cs.au.dk
acknowledgments: E. Ernst
Lexical Analysis Aslan Askarov aslan@cs.au.dk acknowledgments: E. - - PowerPoint PPT Presentation
Compilation 2016 Lexical Analysis Aslan Askarov aslan@cs.au.dk acknowledgments: E. Ernst Lexical analysis High-level source code Lexing Parsing Elaboration Low-level target code Lexical analysis First phase in the compilation
Aslan Askarov aslan@cs.au.dk
acknowledgments: E. Ernst
High-level source code Lexing Elaboration Parsing … Low-level target code
i f x > 0 t h \n e n 1 \t e l s e ( ) \n \t
IF LPAREN ID (“x”) GE INT (0) THEN INT (1) ELSE INT (0) RPAREN Input: stream of characters Output: stream of tokens in our language Discards comments, whitespace, newline, tab characters, preprocessor directives First phase in the compilation
Type Examples ID foo n14 a’ my-fun INT 73 0 070 REAL 0.0 .5 10. IF if COMMA , LPAREN ( ASGMT :=
Type Examples comments /* dead code */ // comment (* nest (*ed*) *) preprocessor directives #define N 10 #include <stdio.h> whitespace
IF , COMMA, LPAREN, RPAREN, ASGMT
ID (“my-fun”)
INT (73), INT (1), FLOAT (IEEE754, 1001111100…)
start/end pos in input file (line number + column, or charpos)
var δ := 0.0
FileName:Line.Col: Illegal character δ
programming language tokens
Examples
if (IF); [a-z][a-z0-9]* (ID); [0-9]* (NUM); ([0-9]+”.”[0-9]*) | ([0-9]* ”.” [0-9]+) (REAL); (”--” [a-z]*”\n”) | (” ”|”\t”) (continue()); . (error (); continue());
longest matching token wins
several tokens of the same length.
i f x > 0
ID (“ifx”)
i f
IF
ID (“if”)
Specification: Tokens as regular exps Formalism: NFA DFA Implementation: Output: Simulate NFA Simulate DFA Program that translates raw text into stream of tokens
+longest-matching rule +priorities linear complexity
“classical” approach – from RegEx to NFA to DFA
1 2 3 5
12 13
9 6 8 7
11 10
4
ID ID IF error REAL NUM REAL error error
i f 0-9,a-z
blank etc. blank etc.
0-9 0-9 a-z
whitespace
a-h,j-z
0-9,a-z a-e,g-z,0-9 0-9 0-9
(* SML declarations *) type lexresult = Tokens.token fun eof() = Tokens.EOF(0,0) %% (* Lex definitions *) digits=[0-9]+ %% (* Regular Expressions and Actions *) if => (Tokens.IF(yypos,yypos+2)); [a-z][a-z0-9]* => (Tokens.ID(yytext,yypos,yypos + size yytext)); {digits} => (Tokens.NUM( Int.fromString yytext, yypos, yypos + size yytext); ({digits}”.”[0-9]*)|([0-9]*”.”{digits}) => (Tokens.REAL( Real.fromString yytext, yypos, yypos + size yytext)); (“--”[a-z]*”\n”)|(“ “|”\n”|”\t”)+ => (continue());
... (* Regular Expressions and Actions *) <INITIAL>if => (Tokens.IF(yypos,yypos+2)); <INITIAL>[a-z][a-z0-9]* => (Tokens.ID(yytext,yypos,yypos + size yytext)); ... <INITIAL>”\”” => (YYBEGIN STRING; continue()); ... <STRING>. => (continue()); ...
Specification: Tokens as regular exps Formalism: NFA DFA Implementation: Output: Simulate NFA Simulate DFA Program that translates raw text into stream of tokens
+longest-matching rule +priorities linear complexity
alternative, purely algebraic approach – from RegEx to DFA using regexp derivatives
[online demo]
Stm → Stm; Stm Stm → id := Exp Stm → print ( ExpList ) Exp → id Exp → num Exp → Exp BinOp Exp Exp → ( Stm , Exp )
(CompoundStm) (AssignStm) (PrintStm) (IdExp) (NumExp) (OpExp) (EseqExp)
ExpList → Exp , ExpList ExpList → Exp Binop → + Binop → – Binop → × Binop → /
(PairExpList) (LastExpList) (Plus) (Minus) (Times) (Div)
CompoundStm AssignStm a OpExp NumExp 5 BinOp Plus NumExp 3 CompoundStm AssignStm b EseqExp PrintStm PairExpList IdExp a LastExpList OpExp IdExp a BinOp Minus NumExp 1 OpExp NumExp 10 BinOp Times IdExp a PrintStm LastExpList IdExp b
a : = 5 + 3 ; b : = ( p r i n t ( a , a - 1),10 * a); p r i n t ( b )
Stm → Stm; Stm Stm → id := Exp Stm → print ( ExpList ) Exp → id Exp → num Exp → Exp BinOp Exp Exp → ( Stm , Exp )
(CompoundStm) (AssignStm) (PrintStm) (IdExp) (NumExp) (OpExp) (EseqExp)
ExpList → Exp , ExpList ExpList → Exp Binop → + Binop → – Binop → × Binop → /
(PairExpList) (LastExpList) (Plus) (Minus) (Times) (Div)
type id = string datatype binop = Plus | Minus | Times | Div datatype stm = CompoundStm of stm * stm | AssignStm of id * exp | PrintStm of exp list and exp = IdExp of id | NumExp of int | OpExp of exp * binop * exp | EseqExp of stm * exp
val prog = CompoundStm ( AssignStm (“a", OpExp ( NumExp 5, Plus, NumExp 3)), CompoundStm ( AssignStm ("b", EseqExp ( PrintStm [IdExp "a", OpExp (…)], OpExp (NumExp 10, …))), PrintStm [IdExp "b"])) a := 5 + 3; b := (print (a, a - 1),10 * a); print (b)
CompoundStm AssignStm a OpExp NumExp 5 BinOp Plus NumExp 3 CompoundStm AssignStm b EseqExp PrintStm PairExpList IdExp a LastExpList OpExp IdExp a BinOp Minus NumExp 1 OpExp NumExp 10 BinOp Times IdExp a PrintStm LastExpList IdExp b
may be ignored at first
involved
known)