Lexical analysis
CS440/540
Lexical analysis CS440/540 Lexical Analysis Process: converting - - PowerPoint PPT Presentation
Lexical analysis CS440/540 Lexical Analysis Process: converting input string (source program) into substrings (tokens) Input: source program Output: a sequence of tokens Also called: lexer, tokenizer, scanner Token and Lexeme
CS440/540
Token Sample lexemes keyword if, else, for, while,… whitespace ‘ ’, ‘\t’, ‘\n’, … comparison <,>,==,!=,… identifier total, score, name, … number 1, 3.14159, 0, … literal “Super nice cool compiler”, “ComS”, …
if (i == j) z = 0; else z = 1; \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
if (i == j) z = 0; else z = 1; \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
characters drawn from S.
Alphabet Language English characters English sentences ASCII C programs
some alphabet.
certain subset of states are marked as “final states.”
input character.
statements
state0: c = getchar(); if (isalpha(c)) token += c; goto state1; error(); state1: c = getchar(); if (isalpha(c) || isdigit(c)) token += c; goto state1; if (isdelimiter(c)) goto state2; error(); state2: return(token);
languages)
a, considering ε-moves as well
S=ABCDH, T=FGHABCD, U=EGHIABCDI