Chapter 6: Syntax Syntax Syntax is the structure of a language. - - PowerPoint PPT Presentation

chapter 6 syntax syntax
SMART_READER_LITE
LIVE PREVIEW

Chapter 6: Syntax Syntax Syntax is the structure of a language. - - PowerPoint PPT Presentation

Chapter 6: Syntax Syntax Syntax is the structure of a language. Earlier, both syntax and semantics were described using lengthy English language explanations. Although semantics are still described in English, syntax is described


slide-1
SLIDE 1

Chapter 6: Syntax

slide-2
SLIDE 2

Syntax

Syntax is the structure of a language. Earlier, both syntax and semantics were described using lengthy English language explanations. Although semantics are still described in English, syntax is described using a formal system.

2

slide-3
SLIDE 3

Syntax

In the 1950s, Noam Chomsky developed the idea of context-free grammars. John Backus, with contributions by Peter Naur, developed a notational system for describing context-free grammars:

The Backus-Naur Forms (BNF)

3

slide-4
SLIDE 4

Syntax

BNF was first used to describe the syntax of Algol60. Later used to describe C, Java, and Ada. Every modern programmer and computer scientist must know how to read, interpret, and apply BNF descriptions of language syntax.

4

slide-5
SLIDE 5

Syntax

BNFs occur in three basic forms:

Original BNF Extended BNF (EBNF) (Popularized by Niklaus Wirth) Syntax Diagrams

5

slide-6
SLIDE 6

Lexical Structure

The lexical structure of a programming language is the structure of its words. Can be considered separate from syntax, but is VERY closely related to it.

6

slide-7
SLIDE 7

Lexical Structure

 Typically, the scanning phase of a translator collects sequences of characters from the input program into tokens.  Tokens are then processed by a parsing phase, which determines the syntactic structure.  Tokens can be defined using either grammar or regular expressions (to describe text patterns).

7

slide-8
SLIDE 8

Lexical Structure

  • Tokens fall into several distinct categories:

– Reserved words (Keywords):

  • if, while, else, main

– Literals or constants:

  • 42, 27.5, “Hello”, ‘A’

– Special symbols:

  • > >= < ; , +

– Identifiers

  • X24, var1, balance

8

slide-9
SLIDE 9

Lexical Structure

Java reserved words:

abstract default if private this boolean implements protected throw do break double import public throws byte else instanceof return transient case extends int short try catch final interface static void char finally long strictfp volatile class float native super while const for new switch continue goto package synchronized

9

slide-10
SLIDE 10

Lexical Structure

Identifiers may not be names as keywords. Keywords may also be called predefined identifiers. In some languages, identifiers have a fixed maximum length.

10

slide-11
SLIDE 11

Lexical Structure

Some programming languages allow arbitrary length of identifiers, but only the first six or eight characters may be guaranteed to be significant (very confusing for programmers).

11

slide-12
SLIDE 12

Lexical Structure

  • What about:

– doif

  • Is it an identifier called “doif”
  • Or is it the keywords do if?

12

slide-13
SLIDE 13

Lexical Structure

Principle of longest substring (Principle of maximum munch):

At each point, the longest possible string of characters is collected into a single token.

13

slide-14
SLIDE 14

Lexical Structure

The principle of longest substring requires that certain tokens be separated by token delimiters or white space. End of lines may be significant, indentation may also be significant.

14

slide-15
SLIDE 15

Lexical Structure

 A free format language is one where the format does not affect the program structure (Except to satisfy the principle of longest substring of course). Example:

Put as many blank lines as you want. Put as many spaces as you want between identifiers.

 Most modern languages are free format.

15

slide-16
SLIDE 16

Lexical Structure

 FORTRAN is a primary example of a language violating the free format conventions.  As pre-processing, FORTRAN totally ignores white spaces. They are removed before processing starts.  FORTRAN has no reserved words at all.

16

slide-17
SLIDE 17

Lexical Structure

Regular expressions:

Are descriptions of patterns of characters. Composed of three basic operations:

Concatenation Repetition Choice (selection)

17

slide-18
SLIDE 18

Lexical Structure

 Regular expressions:

 Example, describe using a regular expression the occurrence of:

 0 or more repetitions of either a or b  Followed by the single character c (concatenation)

 Such as:

 aaaaabbbbbbc  abbbbbbbbbc  abaaaabbbbaaaaabc  c  abaaabbbbc  bbbbbbc Repetition Choice Concatenation

18

slide-19
SLIDE 19

Lexical Structure

 Regular expressions:

 Example, describe using a regular expression the occurrence of:

 0 or more repetitions of either a or b  Followed by the single character c (concatenation)

 Example of rejected strings:

 bca  cabbbb  b  a  aaaabbb

19

slide-20
SLIDE 20

Lexical Structure

 Regular expressions:

Example, describe using a regular expression the occurrence of:

 0 or more repetitions of either a or b  Followed by the single character c (concatenation)

The regular expression is:

 (a | b)* c  The | means OR  The * means zero or more occurrences

20

slide-21
SLIDE 21

Lexical Structure

  • Regular expressions:

– Regular expression notation is often extended by additional operators such as the “+” operator. – (a | b)+

  • Means ONE or more occurrences of either a or b
  • Equivalent to (a | b) (a | b)*

21

slide-22
SLIDE 22

Lexical Structure

Regular expressions:

Example: write a regular expression for integer constants: i.e. one or more digits. Note [a-b] means a range

22

slide-23
SLIDE 23

Lexical Structure

Regular expressions:

Example: write a regular expression for integer constants [0-9]+

23

slide-24
SLIDE 24

Lexical Structure

Regular expressions:

Example: write a regular expression for floating point constants: One or more digits followed by an

  • ptional decimal point then one or more digits.

[0-9]+(\.[0-9]+)?

Escape Sequence Optional

24

slide-25
SLIDE 25

Lexical Structure

Regular expressions:

Most modern text editors allow for defining regular expressions to perform searching. Search utilities such as UNIX grep also uses it. Lex can also be used to turn regular expressions into an automatic scanner!

25

slide-26
SLIDE 26

Lexical Structure

Regular expressions:

Can you write a small lexical analyzer to recognize certain tokens. Can you write a small scanner to accept a simple expression consisting of the tokens you previously recognized?.

26

slide-27
SLIDE 27

Parsing Techniques and Tools

  • A scanner program that only identifies tokens using regular

expressions can be automatically generated using regular expressions.

  • Lex is a famous scanner generator.
  • It’s freeware version is called Flex (Fast Lex).
  • To be covered in detail in a compiler course.

27

slide-28
SLIDE 28

Context-Free Grammars and BNFs

Grammar of a Simple English Sentence Example:

sentence -> noun_phrase verb_phrase . noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets

OR

28

slide-29
SLIDE 29

Context-Free Grammars and BNFs

  • Grammar of a Simple English Sentence

Example:

– One can alternatively use different notation such as:

  • <sentence> ::= <noun_phrase> <verb_phrase> ‘.’
  • But the ‘ ‘ used around the full stop now also become

metasymbols themselves.

29

slide-30
SLIDE 30

Context-Free Grammars and BNFs

There is an ISO standard format for BNF notation. ISO 14977 [1996]

30

slide-31
SLIDE 31

Context-Free Grammars and BNFs

  • Question: Does the sentence “The girl sees a

dog.” belong to the grammar indicated earlier?

  • We go through a process of derivation to see if

this sentence is accepted by the grammar or not.

31

slide-32
SLIDE 32

Context-Free Grammars and BNFs

 Exercise: Is it possible to derive:

The girl sees a dog. From the following grammar?

sentence -> noun_phrase verb_phrase . noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets

32

slide-33
SLIDE 33

Context-Free Grammars and BNFs

  • There are two primary problems with the previous

grammar:

– thegirlseesapet is also an acceptable sentence.

  • It is up to the scanner to be insensitive to spaces.

– The grammar does not specify that articles appearing at the beginning of a sentence should be capitalized.

  • Such “positional” property is often hard to deal with using

context-free grammars.

33

slide-34
SLIDE 34

Context-Free Grammars and BNFs

Terminology:

sentence -> noun_phrase verb_phrase . noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets

Non-Terminal Terminal Metasymbol Production (Grammar Rule) Start Symbol

34

slide-35
SLIDE 35

Context-Free Grammars and BNFs

  • Definitions:

– A context-free grammar consists of a series of grammar rules:

  • The rules consist of a left hand side that is a single structure.
  • Followed by a metasymbol “->”
  • Followed by a right hand side consisting of non-terminals and

terminals separated by |

– Productions are in BNF if they are as given using only the symbols

  • ->
  • |
  • Sometimes parenthesis

35

slide-36
SLIDE 36

Context-Free Grammars and BNFs

Definitions:

A context-free language:

Defines the language of the grammar. This language is the set of all strings of terminals for which there exists a derivation beginning with the start symbol and ending with the string of terminals.

36

slide-37
SLIDE 37

Context-Free Grammars and BNFs

  • Definitions:

– A grammar is called context-free because:

  • Non-terminals appear singly on the left hand side of productions.
  • Each non-terminal can be replaced by any right-hand side choice.

– Example:

– In the previous example, we can use any of the given verbs (pets, sees) with the girl subject (context-free) – It may make sense to use the verb “pets” only with girls, this will make it context-sensitive! – Context-sensitivity is more of a semantic issue!!

37

slide-38
SLIDE 38

Context-Free Grammars and BNFs

Definitions:

A grammar is made context-sensitive by adding non-terminals to the left hand side of productions. Anything that is not expressible using a context- free grammar is a semantic, not a syntactic issue.

38

slide-39
SLIDE 39

Context-Free Grammars and BNFs

Example of a context-sensitive grammar: Enforce articles appearing at beginning of sentences to be capital.

sentence -> beginning noun-phrase verb-phrase .

beginning article -> The | A (Newly added production)

noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets Added to the first rule

Two non- terminals on the LHS !! NOT context-free!

39

slide-40
SLIDE 40

Context-Free Grammars and BNFs

 Example of a context- sensitive grammar: Enforce articles appearing at beginning of sentences to be capital.

sentence -> beginning noun-phrase verb-phrase . beginning article -> The | A noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets

Derivation:

sentence -> beginning noun-phrase verb-phrase. sentence -> beginning article noun verb-phrase. sentence -> THE noun verb-phrase. Now we enforced capital letters at the beginning of sentences !!! (Semantic!)

40

slide-41
SLIDE 41

Context-Free Grammars and BNFs

 Example: Describe using a CFG arithmetic expressions with addition and multiplication

expr -> expr + expr | expr * expr | (expr) | number number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

We can alternatively say number -> [0-9]+

 Exercise: Derive 235 + 55  Exercise: Derive (2 + 5) * 6

41

slide-42
SLIDE 42

Parse Trees and Abstract Syntax Trees

 Derivations express the structure of syntax, but not very well.  There could be multiple derivations at times.  A parse tree better expresses the structure inherent in a derivation.  The parse tree graphically describes the replacement process in a derivation.

42

slide-43
SLIDE 43

Parse Trees and Abstract Syntax Trees

 Example: Derive 234 from the following grammar using a parse tree.

number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

43

slide-44
SLIDE 44

Parse Trees and Abstract Syntax Trees

 Example: Derive 234 from the following grammar using a parse tree.

number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

number number number digit digit digit 2 3 4

44

slide-45
SLIDE 45

Parse Trees and Abstract Syntax Trees

 Example: Derive (2+3) * 4 from the following grammar using a parse tree.

expr -> expr + expr | expr * expr | (expr) | number number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

45

slide-46
SLIDE 46

Parse Trees and Abstract Syntax Trees

 Example: Derive (2+3) * 4 from the following grammar using a parse tree.

expr -> expr + expr | expr * expr | (expr) | number number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

expr * expr number digit 4 expr expr expr + number number digit digit 2 3 expr ( )

46

slide-47
SLIDE 47

Parse Trees and Abstract Syntax Trees

Notes:

Leaves are terminals (tokens) Interior nodes are non-terminals Every replacement in a derivation using a grammar:

A -> xyz corresponds to the creation of children at node A:

A z y x ...

47

slide-48
SLIDE 48

Parse Trees and Abstract Syntax Trees

Abstract Syntax Trees:

Parse trees are too detailed. An abstract syntax tree condenses a parse tree to its essential structure.

48

slide-49
SLIDE 49

Parse Trees and Abstract Syntax Trees

Abstract Syntax Trees Example:

number number number digit digit digit 2 3 4

2 3 4

Abstract Syntax Tree Original (Concrete) Syntax Tree

49

slide-50
SLIDE 50

Parse Trees and Abstract Syntax Trees

Abstract Syntax Trees Example:

Abstract Syntax Tree

(Even Parentheses Can Go)

Original (Concrete) Syntax Tree

* 4 + 2 3

expr * expr number digit 4 expr expr expr + number number digit digit 2 3 expr ( ) 50

slide-51
SLIDE 51

Parse Trees and Abstract Syntax Trees

Syntax Directed Semantics:

The parse tree and the abstract syntax tree must have a structure that corresponds to the computation being performed. Also called Semantics-Based Syntax.

51

slide-52
SLIDE 52

Ambiguity, Associativity, and Precedence

Ambiguity:

Two different derivations can lead to the same parse tree. Different derivations can lead to different parse trees also.

52

slide-53
SLIDE 53

Ambiguity, Associativity, and Precedence

Ambiguity:

A grammar is ambiguous if some string has two distinct parse (or abstract syntax) trees. Not necessarily just two distinct derivations!

53

slide-54
SLIDE 54

Ambiguity, Associativity, and Precedence

Ambiguity Example:

expr expr expr expr + * expr expr expr + * expr expr expr

NUMBER (2) NUMBER (3) NUMBER (4) NUMBER (2) NUMBER (3) NUMBER (4)

Grammar: expr -> expr + expr | expr * expr | (expr) | number number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Derive: 2 + 3 * 4

Two Parse Trees Derive the same Expression!! Precedence Issue (Which

  • ne first,

multiplication

  • r addition?)

54

slide-55
SLIDE 55

Ambiguity, Associativity, and Precedence

Ambiguity Example:

Grammar (With Subtraction Now): expr -> expr + expr | expr - expr | (expr) | number number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Derive: 2 - 3 - 4

Two Parse Trees Derive the same Expression!! Associativity Issue (Which subtraction to execute first?) expr expr expr expr

  • expr

expr expr

  • expr

expr expr

NUMBER (2) NUMBER (3) NUMBER (4) NUMBER (2) NUMBER (3) NUMBER (4)

55

slide-56
SLIDE 56

Ambiguity, Associativity, and Precedence

Ambiguity:

Ambiguity must usually be eliminated. Semantics determine which parse tree is correct.

56

slide-57
SLIDE 57

Ambiguity, Associativity, and Precedence

Leftmost Derivation:

You can identify the presence of ambiguity by leftmost derivations. When performing a derivation, only replace the leftmost remaining non-terminal. A leftmost derivation must have a unique parse tree, otherwise, the grammar is ambiguous!

57

slide-58
SLIDE 58

Ambiguity, Associativity, and Precedence

Leftmost Derivation Example:

Derive 3 + 4 * 5

expr -> expr + expr expr -> number + expr expr -> 3 + expr expr -> 3 + expr * expr expr -> 3 + number * expr Etc.

Always replace the leftmost non-terminal first!

58

slide-59
SLIDE 59

Ambiguity, Associativity, and Precedence

Another Leftmost Derivation:

Derive 3 + 4 * 5

expr -> expr * expr expr -> expr + expr * expr expr -> number + expr * expr expr -> 3 + expr * expr expr -> 3 + number * expr Etc.

The leftmost derivation in this example lead to a different parse tree!! The grammar is ambiguous!!

59

slide-60
SLIDE 60

Ambiguity, Associativity, and Precedence

But which of the previously performed leftmost derivations is the correct one for the expression 3 + 4 * 5? Semantics determine that. Which operator has higher precedence? The addition or the multiplication?

60

slide-61
SLIDE 61

Ambiguity, Associativity, and Precedence

  • Also, when executing 3 – 4 – 5, do we execute

using

  • Left precedence: (3-4) – 5

OR

  • Right precedence: 3 – (4 –5)

61

slide-62
SLIDE 62

Ambiguity, Associativity, and Precedence

 Example of Ambiguity Removal by modifying the grammar.  The grammar:

expr -> expr + expr | expr - expr | (expr) | number number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

 Is changed to:

expr -> expr + term | term term -> term * factor | factor factor -> (expr) | number number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Multiplication is a lower rule, Forces multiplication to occur lower in the parse tree, thus gives it higher precedence than addition

expr + term is different from term + expr It controls associativity (left or right)

62

slide-63
SLIDE 63

Ambiguity, Associativity, and Precedence

If we say

 expr -> expr + term, (left recursion of expr), it causes left associativity  2 + 3 is executed first, then + 4

expr term + + expr term expr

NUMBER (2) NUMBER (3) NUMBER (4)

63

slide-64
SLIDE 64

Ambiguity, Associativity, and Precedence

If we say

 expr -> term + expr, (right recursion of expr), it causes right associativity  3 + 4 is executed first, then 2 is added to them

expr term expr expr + + term

NUMBER (2) NUMBER (3) NUMBER (4)

64

slide-65
SLIDE 65

Ambiguity, Associativity, and Precedence

  • Is there another way to remove ambiguity?

– Fully parenthesized expressions:

expr → ( expr + expr ) | ( expr * expr ) | NUMBER so: ((2 + 3) * 4) and: (2 + (3 * 4)) – Prefix expressions: expr → + expr expr | * expr expr | NUMBER so: + + 2 3 4 and: + 2 * 3 4

– But both alternatives change the language!!!

65

slide-66
SLIDE 66

Extended BNF (EBNF)

  • An extension to classical BNF was adopted to simplify

grammatical rules.

  • Example:

– number -> number digit | digit – Generates a number as a sequence of digits:

  • number -> number digit
  • number -> number digit digit
  • number -> digit digit digit

– Using BNF, we can express it as

  • number -> digit {digit} to express repetition (0 or more
  • ccurrence)

66

slide-67
SLIDE 67

Extended BNF (EBNF)

  • Example:

– expr -> expr + term | term – Generates an expression as a sequence of terms separated by +’s

  • expr -> expr + term
  • expr -> expr + term + term
  • expr -> expr + term + term + term

– It can be written in EBNF as

  • expr -> term {+term}

67

slide-68
SLIDE 68

Extended BNF (EBNF)

  • We can use EBNF to express optional features.
  • if_stmt -> if ( expr ) stmt | if( expr ) stmt else stmt
  • Can be written using EBNF as:
  • if-stmt → if( expr ) stmt [ else stmt ]

68

slide-69
SLIDE 69

Syntax Diagrams

 Syntax diagrams are a useful graphical representation of grammar rules.  It indicates the sequence of terminals and non-terminals encountered in the right hand side of the rule.  EBNF is usually more compact than syntax diagrams.

69

slide-70
SLIDE 70

Syntax Diagrams

  • Example: The syntax diagram of the following EBNF rule:

– if-stmt → if( expr ) stmt [ else stmt ]

if-statement expression statement if ( ) else statement

Circles or ovals denote terminals Squares or rectangles denote terminals

70

slide-71
SLIDE 71

Parsing Techniques and Tools

 A grammar written in BNF, EBNF, or as syntax diagrams describes the strings of tokens that are syntactically legal in a programming language.  The simples form or a parser is a recognizer:

 A program that accepts or rejects strings, based on whether they are legal in the language or not.  More general parsers

 Build parse trees (or abstract syntax trees).  Carry out other operations such as calculating values for expressions.

71

slide-72
SLIDE 72

Parsing Techniques and Tools

 Parsers can be:

 Bottom-Up (Shift Reduce) Parsers.  Top-Down Parsers.

72

slide-73
SLIDE 73

Parsing Techniques and Tools

 Bottom-Up (Shift Reduce) Parsers:

 Match an input such as 234 with the right hand sides of grammatical rules.  When a match occurs, the right hand side is replaced by, or reduced to, the non-terminal on the left.  They construct derivations and parse trees from the leaves to the roots.  They are also called shift reduce parsers because they shift tokens onto a stack prior to reducing strings to non-terminals.

73

slide-74
SLIDE 74

Parsing Techniques and Tools

 Top-Down Parsers:

 Non-terminals are expanded to match incoming tokens and directly construct a derivation.

74

slide-75
SLIDE 75

Parsing Techniques and Tools

  • Programs can be written that automatically translate a BNF description

into a parser.

  • Bottom-up parsing is usually more powerful than top-down parsing, and is

the preferred method for such parser generators.

  • Parser generators are also called compiler compilers.
  • YACC (Yet Another Compiler Compiler) is a famous parser generator. It’s

freeware version is called Bison.

  • To be covered in detail in a compiler course.

75

slide-76
SLIDE 76

Lexics vs. Syntax vs. Semantics

 A number can be defined by a regular expression.  A number can also be defined using a grammatical rule!  How do we define a number, using a regular expression or a BNF rule?  A scanner operating on regular expressions is definitely faster, no need to use the extensive recursive power of a parser operating on BNF.

76

slide-77
SLIDE 77

Lexics vs. Syntax vs. Semantics

  • Example:

– Lexics: tokens exist such as:

  • A, the, girl, dog, sees, pets, .

– Syntax:

  • How do we arrange the tokens above according to a language grammar?
  • Which one come when, the noun, the verb, the article, etc.

– Semantic:

  • Articles such as “a”, “the” need to be upper case if at the beginning of

the sentence.

77

slide-78
SLIDE 78

Lexics vs. Syntax vs. Semantics

  • Rule:

If it is not grammar, or the disambiguating rules, It’s semantics!

78