cse 3341 principles of programming languages syntax
play

CSE 3341: Principles of Programming Languages Syntax Jeremy Morris - PowerPoint PPT Presentation

CSE 3341: Principles of Programming Languages Syntax Jeremy Morris 1 Syntax vs. Semantics Syntax: What kinds of symbols are allowed in a language? Semantics What do the symbols in a language mean ? 2 Language Terminology


  1. CSE 3341: Principles of Programming Languages Syntax Jeremy Morris 1

  2. Syntax vs. Semantics  Syntax:  What kinds of symbols are allowed in a language?  Semantics  What do the symbols in a language mean ? 2

  3. Language Terminology  Alphabet  Finite set of symbols  String  Sequence of symbols  Language  Set of strings over an alphabet  Grammar  Rules that define which strings over an alphabet are in the language and which ones are not 3

  4. Terminology Example  Consider the Java programming language  Alphabet The tokens in the Java language.  if , then , while , do , > , < , String , variable names, etc.  Note: Not the individual characters  Not your intuitive understanding of the term “alphabet”.   String A sequence of tokens from the alphabet   Language The set of all syntactically correct Java programs.   Grammar The rules for producing syntactically correct Java programs.  https://docs.oracle.com/javase/specs/jls/se8/html/index.html  (It’s a nearly 800 page book – you don’t need to read it)  4

  5. Language Terminology  We typically talk about languages in mathematical terms as sets  Alphabet – finite set of symbols Often denoted as Σ   String – finite set of symbol sequences Empty string: ε – a sequence of length 0  Σ * - the set of all strings over Σ (including ε )  The * represents the “Kleene closure” – we’ll discuss this more later  Σ + - the set of all non-empty strings over Σ  The + represents “one or more” where the * represents “zero or more”   Language – set of strings Language L ⊆ Σ *  Defined by a grammar  Probably will not contain everything in Σ *  5

  6. Syntax - Specification  We use syntax rules to specify the syntax of a language  Language – set of all strings  Some rules for non-negative integers: number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  With these we can specify any non-negative integer. 6

  7. Syntax Rule Terminology  Terminal symbol  Any symbol that represents a member of the alphabet for the language i.e. Any symbol that is in the set of all possible tokens for the  language Will only appear on the right hand side of a syntax rule  (At least for our purposes – not strictly true)   Non-terminal symbol  Any symbol that represents a rule to be expanded Non-terminal – meaning “we need to keep going”  Can appear on either the left or the right hand side of a syntax rule   Meta-symbols  Symbols used to write the rules, but not part of the alphabet or non-terminals →, |, *, etc .  7

  8. Terminology Example number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9 Which of these are terminal symbols? Non-terminal? Meta? 8

  9. Syntax – Types of Grammars  Chomsky Hierarchy  Outlines how complex formal languages are based on their rules  Type-0 – Unrestricted (aka Recursively enumerable)  Type-1 – Context-sensitive  Type-2 – Context-free  Type-3 – Regular  We will focus on those last two 9

  10. Regular Languages (aka Regular Expressions)  The simplest kind of grammar  Requires only 3 kinds of rules: Concatenation  Join two things together  Alternation  Select between two choices  “Kleene closure”  Repeat something zero or more times.   No recursion is allowed If we allow recursion, then we get Context-free grammars  10

  11. Regular Languages (aka Regular Expressions)  Assume an alphabet Σ . A regular expression over Σ is:  Φ – the empty set  ε – the empty string  Any member of Σ (i.e. R = { r | r ϵ Σ })  Concatenation If R and S are both regular expressions over Σ , then so is RS  RS = {r.s | r ϵ Σ and s ϵ Σ }   Alternation If R and S are both regular expressions over Σ , then so is R ∪ S  Written as R|S – choose between R or S   “Kleene closure” If R is a regular expression over Σ , then so is R*  R repeated 0 or more times – R concatenated with itself  11

  12. Regular Languages  In syntax rules we can define a regular language like this: number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  Another way of saying: Σ = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9} number = {dd*, d ϵ Σ } (There might be a problem with this definition of a natural number – can you spot it?) 12

  13. Regular Languages  Another example (from the textbook)  Numeric constants number → integer | real integer → digit digit* real → integer exp | decimal (exp | ε ) decimal → digit* (. digit | digit .) digit* exp → (e | E) (+ | - | ε ) integer digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |0 13

  14. Derivations  Using syntax rules we can derive strings that are in our language  Using the previous set of rules, can we show that 655 is in our language of “numeric constants”? ⇒ integer number ⇒ digit digit* ⇒ 6 digit* ⇒ 6 5 digit* ⇒ 6 5 5 digit* ⇒ 6 5 5 14

  15. Derivations Example  Using the rules on the previous slide, determine if the following strings are in the language for numeric constants:  10e5  .65e30  .65e0.30  10.0e5.0  10.0e-5 15

  16. Context-Free Languages  The Chomsky Hierarchy mentioned above is a hierarchy  All Regular Languages are also Context-Free, but not all Context- Free Languages are Regular  Consider the language L = { a n b n | n ≥ 0 } Empty string, ab, aabb, aaabbb, etc. are all in this language  aabbb, aaabb, a, etc. are not.  Can we derive the rules for this language using only the rules set out  for regular languages? No, as it turns out.  You can prove this mathematically using a theorem known as the  pumping lemma , but that’s outside the scope of this class see CSE 3321 – Formal Languages and Automata  But if we allow recursion we can do it easily  16

  17. Context-Free Grammars (CFGs)  A grammar that defines a Context-Free language has the same properties as a Regular grammar…  Concatenation, Alternation, Kleene Closure  …but allows for recursion in its rules  Either immediate recursion – the non-terminal on both the right and left hand side of the same rule We’ll see an example of this on the next slide   Or mutal recursion – a non-terminal on the left expands a rule that eventually expands that non-terminal We’ll see an example of this in a moment – hang in there  17

  18. Context Free Grammars (CFGs) The following grammar is not Regular, but is Context-Free: expr → number | expr op expr | ( expr ) op → + | - | / | * number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  Note the recursion in the rule for expanding expr  This grammar is problematic…  Let’s derive 1+3*2 using the previous rules 18

  19. Context-Free Grammars  We can represent a derivation graphically as a parse tree or syntax tree  The root of the tree is the start symbol for the grammar  The internal nodes are non-terminal symbols  The leaf nodes are terminal symbols expr expr expr op number + expr op expr 1 number * number 2 3 19

  20. Context-Free Grammars  Consider these two trees, both derived from the above expr grammar: expr expr op number expr op expr * 2 number number + expr 3 1 expr expr op number + expr op expr 1 number * number 2 3 20

  21. Context-Free Grammars  A better, unambiguous grammar: expr → term | expr add_op term term → factor | term mult_op factor factor → number | ( expr ) mult_op → * | / add_op → + | - number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  Still not Regular, but Context-Free  Recursion is still there 21

  22. Languages in Compilers & Interpreters Stream of Parse Tokenizer/ Next Steps Characters Tree Scanner Stream of Parser tokens 22

  23. Syntax - Specification  The previous syntax rules are one type on notation for a syntax. number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  Here’s another: <number> ::= <digit> | <digit> <number> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9 Backus-Naur Form (aka Backus normal form aka BNF)  Note that pure BNF does not use Kleene-star or Kleene-plus  Other extensions provide shorthand to allow these, but it doesn't change the  expressiveness to not have them (see above for how to replace Kleene star) 23

  24. BNF Specification <number> ::= <digit> | <digit> <number> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9  Special symbols: <, >, | and ::=  Reserved (or ‘meta’) symbols  Non-terminals  Wrapped in <> tags - <digit> or <number>  Indicate rules that need to be expanded  Terminals  Not wrapped in <> tags  Indicate “terminal” symbols – no more expansion 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend