tokenizing
play

Tokenizing 19 March 2019 OSU CSE 1 BL Compiler Structure Code - PowerPoint PPT Presentation

Tokenizing 19 March 2019 OSU CSE 1 BL Compiler Structure Code Tokenizer Parser Generator string of string of abstract string of characters tokens program integers (source code) (words) (object code) The tokenizer is


  1. Tokenizing 19 March 2019 OSU CSE 1

  2. BL Compiler Structure Code Tokenizer Parser Generator string of string of abstract string of characters tokens program integers (source code) (“words”) (object code) The tokenizer is relatively easy. 19 March 2019 OSU CSE 2

  3. Aside: Characters vs. Tokens • In the examples of CFGs, we dealt with languages over the alphabet of individual characters (e.g., Java’s char values) Σ = character • Now, we deal with languages over an alphabet of tokens , each of which is a unit that you want to consider as a single entity in the language – Choice of tokens is a design decision 19 March 2019 OSU CSE 3

  4. Example: Expression CFG → expr add-op term | term expr → term mult-op factor | factor term → ( expr ) | digit-seq factor → + | - add-op → * | DIV | REM mult-op → digit digit-seq | digit digit-seq → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 4

  5. Appropriate tokens for Example: Expression CFG this CFG are “words” consisting of strings of → expr add-op term | term expr consecutive terminal → term mult-op factor | factor term symbols (characters) that → ( expr ) | digit-seq “belong together”, e.g., factor "+" , "DIV" , "5" . → + | - add-op → * | DIV | REM mult-op → digit digit-seq | digit digit-seq → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 5

  6. Tokenizer • The job of the tokenizer is to transform a string of characters into a string of tokens • Example: – Input: "4 + (7 DIV 3) REM 5" 19 March 2019 OSU CSE 6

  7. Tokenizer • The job of the tokenizer is to transform a string of characters into a string of tokens • Example: – Input: "4 + (7 DIV 3) REM 5" characters used as terminal symbols of the language 19 March 2019 OSU CSE 7

  8. Tokenizer • The job of the tokenizer is to transform a string of characters into a string of tokens • Example: – Input: "4 + (7 DIV 3) REM 5" whitespace characters 19 March 2019 OSU CSE 8

  9. Tokenizer • The job of the tokenizer is to transform a string of characters into a string of tokens • Example: – Input: "4 + (7 DIV 3) REM 5" Mathematically, input is a string of character 19 March 2019 OSU CSE 9

  10. Tokenizer • The job of the tokenizer is to transform a string of characters into a string of tokens • Example: – Input: "4 + (7 DIV 3) REM 5" – Output: <"4", "+", "(", "7", "DIV", "3", ")", "REM", "5"> Mathematically, output is a string of string of character 19 March 2019 OSU CSE 10

  11. Another Example: BL • In BL, tokens can be the “words” such as "IF" , "next-is-empty" , etc. • A BL tokenizer is then easy: it can simply treat strings of consecutive whitespace characters as separators between tokens – This makes it easy for the language to allow line separators, extra spaces and tabs used for indentation, etc., to have no impact on the legality of a program 19 March 2019 OSU CSE 11

  12. Resources • Wikipedia: Lexical Analysis – http://en.wikipedia.org/wiki/Lexical_analysis 19 March 2019 OSU CSE 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend