context free grammars
play

Context-Free Grammars 19 March 2019 OSU CSE 1 BL Compiler - PowerPoint PPT Presentation

Context-Free Grammars 19 March 2019 OSU CSE 1 BL Compiler Structure Code Tokenizer Parser Generator string of string of abstract string of characters tokens program integers (source code) (words) (object code) The parser is


  1. Context-Free Grammars 19 March 2019 OSU CSE 1

  2. BL Compiler Structure Code Tokenizer Parser Generator string of string of abstract string of characters tokens program integers (source code) (“words”) (object code) The parser is arguably the most interesting, and most difficult, piece of the BL compiler. 19 March 2019 OSU CSE 2

  3. Plan for the BL Parser • Design a context-free grammar (CFG) to specify syntactically valid BL programs • Use the grammar to implement a recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Program object) 19 March 2019 OSU CSE 3

  4. Plan for the BL Parser • Design a context-free grammar (CFG) to specify syntactically valid BL programs • Use the grammar to implement a recursive-descent parser (i.e., an algorithm to parse a BL program and A grammar is a set of construct the corresponding Program formation rules for strings in a language. object) 19 March 2019 OSU CSE 4

  5. Plan for the BL Parser • Design a context-free grammar (CFG) to specify syntactically valid BL programs • Use the grammar to implement a recursive-descent parser (i.e., an A grammar is context-free algorithm to parse a BL program and if it satisfies certain construct the corresponding Program technical conditions object) described herein. 19 March 2019 OSU CSE 5

  6. Languages • A language is a set of strings over some alphabet Σ • If L is a language, then mathematically it is a set of string of Σ 19 March 2019 OSU CSE 6

  7. Aside: Characters vs. Tokens • In the following examples of CFGs, we deal with languages over the alphabet of individual characters (e.g., Java’s char values) Σ = character • In the BL project, we deal with languages over an alphabet of tokens (to be explained later) 19 March 2019 OSU CSE 7

  8. Example: Real-Number Constants • Some syntactically valid real-number constants (i.e., some strings in the “language of valid real-number constants”): 37.044 615.22E16 99241. 18.E-93 19 March 2019 OSU CSE 8

  9. CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | E – digit-seq → digit digit-seq | digit-seq digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 9

  10. CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | This is a rewrite rule (a E – digit-seq replacement rule), which → digit digit-seq | digit-seq describes how strings in the digit language may be formed. → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 10

  11. CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | E – digit-seq A name on the left of a → digit digit-seq | digit-seq rewrite rule is called a non-terminal symbol . digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 11

  12. CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | The special CFG symbol → E – digit-seq → digit digit-seq | digit-seq means “can be rewritten as” or “can be replaced by”. digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 12

  13. CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | The special CFG symbol | E – digit-seq means “or”, i.e., there are → digit digit-seq | digit-seq multiple possible “rewrites” digit for the same non-terminal. → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 13

  14. CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | E – digit-seq → digit digit-seq | digit-seq So this ... digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 14

  15. CFG Rewrite Rules real-const → digit-seq . digit-seq real-const → digit-seq . digit-seq exponent real-const → digit-seq . real-const → digit-seq . exponent → E digit-seq | exponent E + digit-seq | E – digit-seq ... means exactly the same → digit digit-seq | digit-seq thing as these four separate digit rewrite rules. → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 15

  16. CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | One non-terminal symbol E – digit-seq (normally in the first rewrite → digit digit-seq | digit-seq rule) is called the digit start symbol . → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 16

  17. CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | A symbol from the alphabet E – digit-seq on the right-hand side of a → digit digit-seq | digit-seq rewrite rule is called a digit terminal symbol . → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 17

  18. CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | To remember the name: terminal E – digit-seq symbols are what you end up with → digit digit-seq | digit-seq when generating strings in the digit language (see below). → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 18

  19. Four Components of a CFG • Non-terminal symbols for this CFG: – real-const, exponent, digit-seq, digit • Terminal symbols for this CFG: – . , E , + , - , 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 • Start symbol for this CFG: – real-const • Rewrite rules for this CFG: – (see previous slides) 19 March 2019 OSU CSE 19

  20. Derivations • A derivation of a string of terminal symbols consists of a sequence of specific rewrite-rule applications that begin with the start symbol and continue until only terminal symbols remain – A string is in the language of the CFG iff there is a derivation that leads to it • The symbol ⇒ indicates a derivation step, i.e., a specific rewrite-rule application 19 March 2019 OSU CSE 20

  21. Example: Derivation of 5.6E10 • Begin with the start symbol: real-const ⇒ 19 March 2019 OSU CSE 21

  22. Example: Derivation of 5.6E10 • Begin with the start symbol: real-const ⇒ • ... and pick one possible rewrite: real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | Which rewrite digit-seq . | is appropriate digit-seq . exponent to derive 5.6E10 ? 19 March 2019 OSU CSE 22

  23. Example: Derivation of 5.6E10 • This is the first step of the derivation: real-const ⇒ digit-seq . digit-seq exponent 19 March 2019 OSU CSE 23

  24. Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent 19 March 2019 OSU CSE 24

  25. Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent • ... and pick one possible rewrite: → digit digit-seq | digit-seq digit Which rewrite is appropriate to derive 5.6E10 ? 19 March 2019 OSU CSE 25

  26. Example: Derivation of 5.6E10 • This is the second step of the derivation: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent 19 March 2019 OSU CSE 26

  27. Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent 19 March 2019 OSU CSE 27

  28. Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent • ... and pick one possible rewrite: → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 28

  29. Example: Derivation of 5.6E10 • This is the third step of the derivation: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent 19 March 2019 OSU CSE 29

  30. Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent 19 March 2019 OSU CSE 30

  31. Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent • ... and pick one possible rewrite: → digit digit-seq | digit-seq digit 19 March 2019 OSU CSE 31

  32. One Derivation of 5.6E10 real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent ⇒ 5 . digit exponent ⇒ 5 . 6 exponent ⇒ 5 . 6 E digit-seq ⇒ 5 . 6 E digit digit-seq ⇒ 5 . 6 E 1 digit-seq ⇒ 5 . 6 E 1 digit ⇒ 5 . 6 E 1 0 19 March 2019 OSU CSE 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend