introduction to lex or flex
play

Introduction to lex (or flex) Some slides borrowed from M Scherger - PowerPoint PPT Presentation

Introduction to lex (or flex) Some slides borrowed from M Scherger Lex/Flex: A Scanner Generator in C Regular Expression Thomsons Construction Nondeterministic Finite Automaton Subset Construction Deterministic Finite


  1. Introduction to lex (or flex) Some slides borrowed from M Scherger

  2. Lex/Flex: A Scanner Generator in C  Regular Expression Thomson’s Construction  Nondeterministic Finite Automaton “Subset” Construction  Deterministic Finite Automaton  Table-driven Scanner  So why not do this with a tool? 2 Introduction to lex (or flex) Fall 2012

  3. Lex  Lex is a such tool for creating lexical analyzers  M. E. Lesk and E. Schmidt 1975  Lexical analyzers tokenize input streams  Regular expressions define tokens  Tokens are the terminals of a language  Converts regular expressions into DFAs  DFAs are implemented as table driven state machines  Some versions of Lex are proprietary and so not all versions of *nix come with an open source version  flex – Fast Lexical Analyzer is an open source version  Vern Paxson 3 Introduction to lex (or flex) Fall 2012

  4. The Basic Process Lex source program Lex lex.yy.c any.l compiler C a.out lex.yy.c Compiler Sequence a.out Input stream of tokens 4 Introduction to lex (or flex) Fall 2012

  5. Format of a lex File Definitions %% Rules %% User code  1 st section holds declarations of simple name definitions and start conditions  2 nd section holds pattern-action pairs  3 rd section is copied directly to lex.yy.c  C code and comments  Typical file extensions: .l .lex .flex 5 Introduction to lex (or flex) Fall 2012

  6. Compiling and Running > flex linenos.flex yywrap() issue > gcc lexyy.c -lfl > a.out < infile > outfile 6 Introduction to lex (or flex) Fall 2012

  7. Regular Expressions and Lex  A regular expression is an expression that matches sets of strings  (the “language” of the regular expression).  In its basic form, a regular expression is built up out of basic expressions (individual symbols) and the operations  choice (|),  concatenation (no operator),  and repetition (*).  A regular expression may also contain certain other metasymbols:  parentheses for grouping (to change precedence, just as in arithmetic)  others as needed to extend the operator set in useful ways 7 Introduction to lex (or flex) Fall 2012

  8. Regular Expressions in Lex RE Matches  c - c is a single character A A  Matches the character c x x d d  \ c – c is a single character \. .  Use this to escape special characters \n Newline \t tab  “ str ” - str is a string “ Abc ” Abc  Matches entire string str “The” The  [ str ]- str is a string [aeiou] Lowercase vowels  Matches any single character from str [abcde] The letters a to e 8 Introduction to lex (or flex) Fall 2012

  9. Regular Expressions – Character Classes  [ x-y ] – x and y are characters RE Matches [a-z] All lowercase characters  All characters in the range x - y [0-9] All digits [a-df-z] lowercase characters except e  These can be combined [a-z0-9A-Z] Alphanumeric characters [A-Zaeiou] Upper case letters and lc vowels  [^ str ] – str is a string [^ \n\t] all non whitespace [^aeiou] matches anything but lowercase vowels 9 Introduction to lex (or flex) Fall 2012

  10. Regular Expressions  p * – p is a pattern  Zero or more occurrences of p  A AA AAA .... A*  r rr ... r* ab*c* a ab ac abb abc acc abbb abbc abcc accc ...  p + – p is a pattern  One or more occurrences of p A+ A AA AAA AAAA ... ab+ ab abb abbb .... a*b+ b ab bb aab abb bbb .. 10 Introduction to lex (or flex) Fall 2012

  11. Regular Expressions  p ? - p is a pattern  Zero or one occurrences of p  A A? ab?c? a ab ac abc  p { m,n } – p is a pattern, m and n are ints  Matches m through n occurrences of p  if ,n is missing, n = m , if just n is missing n = ∞ a{1,3} a aa aaa a{1,1} a a{1} a a{3,} aaa aaaa aaaaa … 11 Introduction to lex (or flex) Fall 2012

  12. Regular Expressions  p 1 p 2 – p 1 and p 2 are patterns ab ab  Matches p 1 followed by p 2 a+b+ ab aab abb  ( p ) - p is a pattern  Used to override precedence (group things) (abc)+ abc abcabc abcabcabc … abc+ abc abcc abccc …  p 1 |p 2 – p 1 and p 2 are patterns  Matches either p 1 or p 2 a|an|the a an the  Notice precedence ba|ed ba ed b(a|e)d bed bad 12 Introduction to lex (or flex) Fall 2012

  13. Regular Expression - Extra Things  p 1 / p 2 – p 1 and p 2 are patterns  Matches p 1 only if it's followed by p 2  p 2 is not part of yytext RE: a+/bc Input: aaabc bc aaaad matches first aaa only..  ^ p – p is a pattern  matches p only if it is at the start of a line  p $ – p is a pattern  matches p only if it is at the end of a line 13 Introduction to lex (or flex) Fall 2012

  14. Two more complex examples  [-+]?[0-9]+(\.[0-9]+)?([Ee][-+]?[0-9]+)? or:  nat = [0-9]+  signedNat = [-+]? nat  number = signedNat(\. nat)? ([Ee] signedNat)?  C comments /\*/*(\**[^/*]/*)*\**\*/ 14 Introduction to lex (or flex) Fall 2012

  15. Pattern Matching Examples 15 Introduction to lex (or flex) Fall 2012

  16. Format of a lex File Definitions %% Rules %% User code  1 st section holds declarations of simple name definitions and start conditions  2 nd section holds pattern-action pairs  3 rd section is copied directly to lex.yy.c  C code and comments 16 Introduction to lex (or flex) Fall 2012

  17. Definitions  Definitions are of the form: name definition  A name begins with a letter or underscore followed by 0 or more letters, digits, '-', or '_'.  You access it with { name }  Example definitions: Digit [0-9] Char [A-Z] AlphaNum [a-zA-Z0-9] ws [ \n\t] IntegerConst [0-9]+ 17 Introduction to lex (or flex) Fall 2012

  18. Definitions Example Digit [0-9] Char [a-zA-Z] AlphaNum [a-zA-Z0-9] %% {Digit}+”.”{Digit}+ ({Char}|_)({AlphaNum}|[_-])* {printf (“A name '%s' \ n”, yytext);} %% 18 Introduction to lex (or flex) Fall 2012

  19. Rules  Rules are of the form: pattern action  pattern is the RE to match and action is what to do when it is matched  Default rule is to echo the input  Lex matches the longest string possible  If a tie, it matches the 1 st rule in the spec  Actions can be empty – do nothing  Actions can be complex  Use {} if multi-lined  don't forget ';'s  yytext contains the string matched 19 Introduction to lex (or flex) Fall 2012

  20. Example Rules \n linecount++; [0-9]+ sum+=atoi(yytext); {ws}+ a|an|the printf (“found an article \ n”); [aeiou]+ { printf (“A string of vowels \ n”); vcnt++; } 20 Introduction to lex (or flex) Fall 2012

  21. Predefined Rules  ECHO  Copy yytext to output [a-z]+ ECHO;  REJECT  Go to the next alternative, that is the second choice rule to be selected and it’s action taken she s++; he h++;  Won’t count the imbedded he she {s++; REJECT;} he {h++; REJECT;} \n  But this will 21 Introduction to lex (or flex) Fall 2012

  22. Rules Example ex1.l The commands  lex ex1.l %%  produces lex.yy.c a*b printf (“Token 1 found \ n”);  cc -o ex1 lex.yy.c – ll c+ printf (“Token 2 found \ n”);  create executable  May need – lfl if using flex %%  ./ex1 main() {  to execute aaaaaaabbccd yylex(); Default is stdin and Token 1 found } stdout so type Token 1 found aaaaaaaabbccd <return> Token 2 found d 22 Introduction to lex (or flex) Fall 2012

  23. An Example Count chars, words, lines %{ The %{ %} pair allow you unsigned ccnt=0, wcnt = 0, lcnt = 0; to make declarations for %} your lexer word [^ \t\n]+ eol \n %% {word}{wcnt++;ccnt+=yyleng;} {eol} {ccnt++;lcnt++;} . ccnt++; %% main() {yylex(); } 23 Introduction to lex (or flex) Fall 2012

  24. About lex  Lex uses some predefined functions stored in lex library (link with -ll or -lfl)  By default lex copies input to output  By default lex reads stdin, writes stdout  Lex reads its input (a lex script) and produced lex.yy.c  Use %{ and %} in definitions section to declare globals and put #includes  You can use flex instead  Not all 'lex'es are equal!  Man page has more info! 24 Introduction to lex (or flex) Fall 2012

  25. Example 1: The Simplest Example  The simplest example of a lex program is a scanner that acts like the UNIX `cat`program %% . |\n ECHO; %%  Or it could be written as… %% . ECHO; \n ECHO; %% 25 Introduction to lex (or flex) Fall 2012

  26. Lex Predefined Variables 26 Introduction to lex (or flex) Fall 2012

  27. Flex Internal Names Lex internal name Meaning/Use lex.yy.c or lexyy.c Lex output file name yylex Lex scanning routine yytext string matched on current action yyleng length of yytext yyin Lex input file (default: stdin ) yyout Lex output file (default: stdout ) input Lex buffered input routine ECHO Lex default action (print yytext to yyout ) See the Flex documentation for others 27 Introduction to lex (or flex) Fall 2012

  28. Flex Operational Conventions  yylex() runs until it is stopped by a return  ambiguity is resolved by order  any text not explicitly matched is echoed to stdout  EOF is automatically matched and returns 0 from yylex() (unless yywrap() is suitably defined)  yylex() returns an int which can be a token 28 Introduction to lex (or flex) Fall 2012

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend