CSE 3341: Principles of Programming Languages Syntax Jeremy Morris - - PowerPoint PPT Presentation

cse 3341 principles of programming languages syntax
SMART_READER_LITE
LIVE PREVIEW

CSE 3341: Principles of Programming Languages Syntax Jeremy Morris - - PowerPoint PPT Presentation

CSE 3341: Principles of Programming Languages Syntax Jeremy Morris 1 Syntax vs. Semantics Syntax: What kinds of symbols are allowed in a language? Semantics What do the symbols in a language mean ? 2 Language Terminology


slide-1
SLIDE 1

1

CSE 3341: Principles of Programming Languages Syntax

Jeremy Morris

slide-2
SLIDE 2

Syntax vs. Semantics

 Syntax:

 What kinds of symbols are allowed in a language?

 Semantics

 What do the symbols in a language mean? 2

slide-3
SLIDE 3

Language Terminology

 Alphabet

 Finite set of symbols

 String

 Sequence of symbols

 Language

 Set of strings over an alphabet

 Grammar

 Rules that define which strings over an alphabet are in the

language and which ones are not

3

slide-4
SLIDE 4

Terminology Example

 Consider the Java programming language

 Alphabet 

The tokens in the Java language.

if, then, while, do, >, <, String, variable names, etc.

Note: Not the individual characters

  • Not your intuitive understanding of the term “alphabet”.

 String 

A sequence of tokens from the alphabet

 Language 

The set of all syntactically correct Java programs.

 Grammar 

The rules for producing syntactically correct Java programs.

https://docs.oracle.com/javase/specs/jls/se8/html/index.html

(It’s a nearly 800 page book – you don’t need to read it)

4

slide-5
SLIDE 5

Language Terminology

 We typically talk about languages in mathematical terms

as sets

 Alphabet – finite set of symbols 

Often denoted as Σ

 String – finite set of symbol sequences 

Empty string: ε – a sequence of length 0

Σ* - the set of all strings over Σ (including ε)

The * represents the “Kleene closure” – we’ll discuss this more later

Σ+ - the set of all non-empty strings over Σ

The + represents “one or more” where the * represents “zero or more”

 Language – set of strings 

Language L ⊆ Σ*

Defined by a grammar

Probably will not contain everything in Σ*

5

slide-6
SLIDE 6

Syntax - Specification

 We use syntax rules to specify the syntax of a language

 Language – set of all strings

 Some rules for non-negative integers:

number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9

 With these we can specify any non-negative integer.

6

slide-7
SLIDE 7

Syntax Rule Terminology

 Terminal symbol

 Any symbol that represents a member of the alphabet for the

language

i.e. Any symbol that is in the set of all possible tokens for the language

Will only appear on the right hand side of a syntax rule

(At least for our purposes – not strictly true)  Non-terminal symbol

 Any symbol that represents a rule to be expanded 

Non-terminal – meaning “we need to keep going”

Can appear on either the left or the right hand side of a syntax rule

 Meta-symbols

 Symbols used to write the rules, but not part of the alphabet or

non-terminals

→, |, *, etc.

7

slide-8
SLIDE 8

Terminology Example

number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9 Which of these are terminal symbols? Non-terminal? Meta?

8

slide-9
SLIDE 9

Syntax – Types of Grammars

 Chomsky Hierarchy

 Outlines how complex formal languages are based on their rules  Type-0 – Unrestricted (aka Recursively enumerable)  Type-1 – Context-sensitive  Type-2 – Context-free  Type-3 – Regular  We will focus on those last two 9

slide-10
SLIDE 10

Regular Languages (aka Regular Expressions)

 The simplest kind of grammar

 Requires only 3 kinds of rules: 

Concatenation

Join two things together

Alternation

Select between two choices

“Kleene closure”

Repeat something zero or more times.

 No recursion is allowed 

If we allow recursion, then we get Context-free grammars

10

slide-11
SLIDE 11

Regular Languages (aka Regular Expressions)

 Assume an alphabet Σ. A regular expression over Σ is:

 Φ – the empty set  ε – the empty string  Any member of Σ (i.e. R = { r | r ϵ Σ})  Concatenation 

If R and S are both regular expressions over Σ, then so is RS

RS = {r.s | r ϵ Σ and s ϵ Σ}

 Alternation 

If R and S are both regular expressions over Σ, then so is R ∪ S

Written as R|S – choose between R or S

 “Kleene closure” 

If R is a regular expression over Σ, then so is R*

R repeated 0 or more times – R concatenated with itself

11

slide-12
SLIDE 12

Regular Languages

 In syntax rules we can define a regular language like

this: number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9

 Another way of saying:

Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} number = {dd*, d ϵ Σ}

(There might be a problem with this definition of a natural number – can you spot it?)

12

slide-13
SLIDE 13

Regular Languages

 Another example (from the textbook)

 Numeric constants

number → integer | real integer → digit digit* real → integer exp | decimal (exp | ε) decimal → digit* (. digit | digit .) digit* exp → (e | E) (+ | - | ε) integer digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |0

13

slide-14
SLIDE 14

Derivations

 Using syntax rules we can derive strings that are in our

language

 Using the previous set of rules, can we show that 655 is in our

language of “numeric constants”? number ⇒ integer ⇒ digit digit* ⇒ 6 digit* ⇒ 6 5 digit* ⇒ 6 5 5 digit* ⇒ 6 5 5

14

slide-15
SLIDE 15

Derivations Example

 Using the rules on the previous slide, determine if the

following strings are in the language for numeric constants:

 10e5  .65e30  .65e0.30  10.0e5.0  10.0e-5 15

slide-16
SLIDE 16

Context-Free Languages

 The Chomsky Hierarchy mentioned above is a hierarchy

 All Regular Languages are also Context-Free, but not all Context-

Free Languages are Regular

 Consider the language L = { anbn | n ≥ 0 } 

Empty string, ab, aabb, aaabbb, etc. are all in this language

aabbb, aaabb, a, etc. are not.

Can we derive the rules for this language using only the rules set out for regular languages?

No, as it turns out.

  • You can prove this mathematically using a theorem known as the

pumping lemma, but that’s outside the scope of this class

  • see CSE 3321 – Formal Languages and Automata

But if we allow recursion we can do it easily

16

slide-17
SLIDE 17

Context-Free Grammars (CFGs)

 A grammar that defines a Context-Free language has

the same properties as a Regular grammar…

 Concatenation, Alternation, Kleene Closure

 …but allows for recursion in its rules

 Either immediate recursion – the non-terminal on both the right

and left hand side of the same rule

We’ll see an example of this on the next slide

 Or mutal recursion – a non-terminal on the left expands a rule

that eventually expands that non-terminal

We’ll see an example of this in a moment – hang in there

17

slide-18
SLIDE 18

Context Free Grammars (CFGs)

The following grammar is not Regular, but is Context-Free: expr → number | expr op expr | ( expr )

  • p → + | - | / | *

number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9

 Note the recursion in the rule for expanding expr  This grammar is problematic…

 Let’s derive 1+3*2 using the previous rules 18

slide-19
SLIDE 19

Context-Free Grammars

 We can represent a derivation graphically as a parse

tree or syntax tree

 The root of the tree is the start symbol for the grammar  The internal nodes are non-terminal symbols  The leaf nodes are terminal symbols 19

expr expr

  • p

expr number + expr

  • p

expr number * number 1 3 2

slide-20
SLIDE 20

Context-Free Grammars

 Consider these two trees, both derived from the above

grammar:

20

expr expr

  • p

expr number + expr

  • p

expr number * number 1 3 2 expr

  • p

* expr number 2 expr expr

  • p

expr number + number 1 3

slide-21
SLIDE 21

Context-Free Grammars

 A better, unambiguous grammar:

expr → term | expr add_op term term → factor | term mult_op factor factor → number | ( expr ) mult_op → * | / add_op → + | - number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9

 Still not Regular, but Context-Free

 Recursion is still there 21

slide-22
SLIDE 22

Languages in Compilers & Interpreters

22

Stream of Characters Tokenizer/ Scanner Stream of tokens Parser Parse Tree Next Steps

slide-23
SLIDE 23

Syntax - Specification

 The previous syntax rules are one type on notation for a

syntax. number → digit digit* digit → 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9

 Here’s another:

<number> ::= <digit> | <digit> <number> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9

Backus-Naur Form (aka Backus normal form aka BNF)

Note that pure BNF does not use Kleene-star or Kleene-plus

Other extensions provide shorthand to allow these, but it doesn't change the expressiveness to not have them (see above for how to replace Kleene star)

23

slide-24
SLIDE 24

BNF Specification

<number> ::= <digit> | <digit> <number> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5| 6 | 7 | 8 | 9

 Special symbols: <, >, | and ::=

 Reserved (or ‘meta’) symbols

 Non-terminals

 Wrapped in <> tags - <digit> or <number>  Indicate rules that need to be expanded

 Terminals

 Not wrapped in <> tags  Indicate “terminal” symbols – no more expansion 24

slide-25
SLIDE 25

CORE: Imperative Language

<prog> ::= program <decl-seq> begin <stmt-seq> end <decl-seq>::= <decl> | <decl> <decl-seq> <stmt-seq>::= <stmt> | <stmt> <stmt-seq> <decl> ::= int <id-list>; <id-list> ::= <id> | <id>, <id-list> <stmt> ::= <assign> | <if> | <loop> | <in> | <out> <assign> ::= <id> = <exp>; <if> ::= if <cond> then <stmt-seq> end; | if <cond> then <stmt-seq> else <stmt-seq> end; <loop> ::= while <cond> loop <stmt-seq> end; <in> ::= read <id-list>; <out> ::= write <id-list>;

25

slide-26
SLIDE 26

CORE: Imperative Language

<cond> ::= <comp> | !<cond> | [ <cond> and <cond> ] | [ <cond> or <cond> ] <comp> ::= ( <fac> <comp-op> <fac> ) <exp> ::= <term> | <term> + <exp> | <term> - <exp> <term> ::= <fac> | <fac> * <term> <fac> ::= <int> | <id> | ( <exp> ) <comp-op> ::= != | == | < | > | <= | >= <id> ::= <let-seq> | <let-seq><int> <let-seq> ::= <let> | <let><let-seq> <let> ::= A | B | C | ... | X | Y | Z <int> ::= <digit> | <digit><int> <digit> ::= 0 | 1 | 2 | 3 | ... | 9

26

slide-27
SLIDE 27

CORE syntax tree practice

program int X; begin X = 25; write X; end

27

slide-28
SLIDE 28

Concrete Syntax Tree

28

<prog> program <decl-seq> begin <stmt-seq> end <decl> <id> X <stmt> <stmt-seq> <assign> <id> = <expr> ; <stmt> <out> X <term> <fac> <int> 25 write int <id-list> <id-list> <id> X ;

slide-29
SLIDE 29

Abstract Syntax Tree

29

<prog> <decl-seq> <stmt-seq> <decl> <id> X <stmt> <stmt-seq> <assign> <id> <expr> <stmt> <out> X <term> <fac> <int> 25 <id-list> <id-list> <id> X

slide-30
SLIDE 30

CORE parse tree practice

program int Y,Z; begin Y = 20; Z = 5; Y = Y – Z; write Y; end

30

slide-31
SLIDE 31

CORE parse tree practice

program int X,Y,Z; begin Y = 20; Z = 5; X = 21; if Y < Z then if Y < X then Y=Z; else Y=X; end; end; write Y; end

31

slide-32
SLIDE 32

Takeaways

 Syntax vs. Semantics  Regular Languages vs. Context Free Languages  Parsing and ambiguity  Abstract vs. Concrete Parse Trees

32

slide-33
SLIDE 33

Readings

 Chapter 2.1 – Syntax  For next time: Chapter 2.3 – Parsing

 Skim Chapter 2.2 - Scanning 33