Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2

Outline • Languages and grammars • Why? • Regular languages (for scanning) • Context ‐ free languages (for parsing) – Derivation trees (a.k.a. parse trees) – Ambiguity • The Core language – A scanner for Core 2

Formal Languages • Basis for the design and implementation of programming languages • Alphabet : finite set Σ of symbols • String : finite sequence of symbols – Empty string  : sequence of length zero – Σ * ‐ set of all strings over Σ (incl.  ) – Σ + ‐ set of all non ‐ empty strings over Σ • Language : set of strings L  Σ * – E.g., for Java, Σ is Unicode, a string is a program, and L is defined by a grammar in the language spec 3

Formal Grammars • G = (N, T, S, P) – Finite set of non ‐ terminal symbols N – Finite set of terminal symbols T – Starting non ‐ terminal symbol S  N – Finite set of productions P – Describes a language L  T* • Production: x  y – x is a non ‐ empty sequence of terminals and non ‐ terminals; y is a seq. of terminals and non ‐ terminals • Applying a production: uxv  uyw 4

Example: Non ‐ negative Integers • N = { I, D } • T = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } • S = I I  D, • P = { I  DI, D  0, D  1, …, D  9 } 5

More Common Notation I  D | DI ‐ two production alternatives D  0 | 1 | … | 9 ‐ ten production alternatives • Terminals: 0 … 9 • Starting non ‐ terminal: I – Shown first in the list of productions • Examples of production applications: I  DI D6I  D6D DI  DDI D6D  36D DDI  D6I 36D  361 6

Languages and Grammars • String derivation * – w 1  w 2  …  w n ; denoted w 1  w n + – If n>1, non ‐ empty derivation sequence: w 1  w n • Language generated by a grammar + – L(G) = { w  T* | S  w } • Fundamental theoretical characterization: Chomsky hierarchy (Noam Chomsky, MIT) – Regular languages  Context ‐ free languages  Context ‐ sensitive languages  Unrestricted languages – Regular languages in PL: for lexical analysis – Context ‐ free languages in PL: for syntax analysis 7

Outline • Languages and grammars • Why? • Regular languages (for scanning) • Context ‐ free languages (for parsing) – Derivation trees (a.k.a. parse trees) – Ambiguity • The Core language – A scanner for Core 8

Regular Languages in Compilers & Interpreters stream of w,h,i,l,e,(,a,1,5,>,b,b,),d,o,… characters Scanner (uses a regular grammar to perform lexical analysis) stream of keyword [while], leftparen , id [a15], op [>], tokens id [bb], rightparen , keyword [do], … Parser (uses a context ‐ free grammar to perform syntax analysis) parse each token is a leaf in the parse tree tree … more compiler/interpreter components 9

Overview of Compilation 10

Source Code for Euclid’s GCD Algorithm • This is code in Pascal, but you should have no problem reading it program gcd(input, output); var i, j: integer; begin read(i, j); while i <> j do if i > j then i := i – j else j := j – i writeln(j); end. 11

Tokens (After Lexical Analysis) PROGRAM, (IDENT, “gcd”), LPAREN, (IDENT, “input”), COMMA, (IDENT, “output”), SEM, VAR, (IDENT, “i”), COMMA, (IDENT, “j”), COLON, INTEGER, SEM, BEGIN, ... 12

Parse Tree (After Syntax Analysis) 13

Abstract Syntax Tree and Symbol Table 14

Assembly (Target Language) 15

Outline • Languages and grammars • Regular languages (for scanning) • Context ‐ free languages (for parsing) – Derivation trees (a.k.a. parse trees) – Ambiguity • The Core language – A scanner for Core 16

Regular Languages (1/5) • Operations on languages – Union : L  M = all strings in L or in M – Concatenation : LM = all ab where a in L and b in M – L 0 = {  } and L i = L i ‐ 1 L – Closure : L * = L 0  L 1  L 2  … – Positive closure : L + = L 1  L 2  … • Regular expressions: notation to express languages constructed with the help of such operations – Example: (0|1|2|3|4|5|6|7|8|9) + 17

Regular Languages (2/5) • Given some alphabet, a regular expression is – The empty string  – Any symbol from the alphabet – If r and s are regular expressions, so are r|s , rs , r * , r + , r? , and (r) – * / + /? have higher precedence than concatenation, which has higher precedence than | – All are left ‐ associative 18

Regular Languages (3/5) • Each regular expression r defines a language L(r) – L(  ) = {  } – L(a) = { a } for alphabet symbol a – L(r|s) = L(r)  L(s) – L(rs) = L(r)L(s) – L(r * ) = (L(r)) * – L(r + ) = (L(r)) + – L(r?) = {  }  L(r) – L((r)) = L(r) • Example: what is the language defined by 0(x|X)(0|1|…|9|a|b|…|f|A|B|…|F) + 19

Regular Languages (4/5) • Regular grammars – All productions are A  wB and A  w • A and B are non ‐ terminals; w is a sequence of terminals • This is a right ‐ regular grammar – Or all productions are A  Bw and A  w • Left ‐ regular grammar • Example: L = { a n b | n > 0 } is a regular language – S  A b and A  a | A a • I  D | DI and D  0 | 1 | … | 9 : is this a regular grammar? 20

Regular Languages (5/5) • Equivalent formalisms for regular languages – Regular grammars – Regular expressions – Nondeterministic finite automata (NFA) – Deterministic finite automata (DFA) – Additional details: Sections 2.2 and 2.4 • What does this have to do with PLs? – Foundation for lexical analysis done by a scanner – You will have to implement a scanner for your interpreter project; Section 2.2 provides useful guidelines 21

Uses of Regular Languages • Lexical analysis in compilers – E.g., an identifier token is a string from the regular language letter (letter|digit)* – Each token is a terminal symbol for the context ‐ free grammar of the parser • Pattern matching – stdlinux> grep “a\+b” foo.txt – Find every line from foo.txt that contains a string from the language L = { a n b | n > 0 } • i.e., the language for reg. expr. a + b 22

Regular Languages in Compilers & Interpreters stream of w,h,i,l,e,(,a,1,5,>,b,b,),d,o,… characters Scanner (uses a regular grammar to perform lexical analysis) stream of keyword [while], leftparen , id [a15], op [>], tokens id [bb], rightparen , keyword [do], … Parser (uses a context ‐ free grammar to perform syntax analysis) parse each token is a leaf in the parse tree tree … more compiler/interpreter components 23

Outline • Languages and grammars • Regular languages (for scanning) • Context ‐ free languages (for parsing) – Derivation trees (a.k.a. parse trees) – Ambiguity • The Core language – A scanner for Core 24

Context ‐ Free Languages • They subsume regular languages – Every regular language is a c.f. language – L = { a n b n | n > 0 } is c.f. but not regular • Generated by a context ‐ free grammar – Each production: A  w – A is a non ‐ terminal, w is a sequences of terminals and non ‐ terminals • BNF (Backus ‐ Naur Form): traditional alternative notation for context ‐ free grammars – John Backus and Peter Naur, for Algol ‐ 58 and Algol ‐ 60 • Backus was also one of the creators of Fortran – Both are recipients of the ACM Turing Award 25

Example: Non ‐ negative Integers • I  D | DI and D  0 | 1 | … | 9 • BNF – <integer> ::= <digit> | <digit><integer> – <digit> ::= 0 | 1 | … | 9 • What if we wanted to disallow zeroes at the beginning? – e.g. 509 is OK, but 059 is not • Possible motivation: in C, leading 0 means an octal constant – Propose a context ‐ free grammar that achieves this • Is this grammar regular? If not, can you change it to make it regular? 26

Derivation Tree for a String • Also called parse tree or concrete syntax tree – Leaf nodes: terminals – Inner nodes: non ‐ terminals – Root: starting non ‐ terminal of the grammar • Describes a particular way to derive a string based on a context ‐ free grammar – Leaf nodes from left to right are the string – To get this string: depth ‐ first traversal of the tree, always visiting the leftmost unexplored branch 27

Example of a Derivation Tree <expr> ::= <term> | <expr> + <term> <term> ::= id | ( <expr> ) Parse tree for <expr> (x+y)+z <expr> + <term> <term> z ( <expr> ) <expr> + <term> <term> y x 28

Equivalent Derivation Sequences The set of string derivations that are represented by the same parse tree One derivation: <expr>  <expr> + <term>  <expr> + z  <term> + z  (<expr>) + z  (<expr> + <term>) + z  (<expr> + y) + z  (<term> + y) + z  (x + y) + z Another derivation: <expr>  <expr> + <term>  <term> + <term>  (<expr>) + <term>  (<expr> + <term>) + <term>  (<term> + <term>) + <term>  (x + <term>) + <term>  (x + y) + <term>  (x + y) + z Many more … 29

Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 - PowerPoint PPT Presentation

Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Outline Languages and grammars Why? Regular languages (for scanning) Context free languages (for parsing) Derivation trees (a.k.a. parse trees) Ambiguity

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

CSC 473 Automata, Grammars & Languages 11/9/10 Automata, Grammars and Languages Discourse 06

CSC 473 Automata, Grammars & Languages 8/15/10 Automata, Grammars and Languages Discourse 01

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Formal Grammars Why Study Grammars? Whats a Grammar? August 24, 2014 Parsing Brian A.

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Context-Free Grammars and Languages Context-Free Grammars and Languages p.1/40

Outline Languages and Formal Systems BNF Grammars Describing Languages Learning

Parsing: Introduction Context-free Grammars Chomsky hierarchy Type 0 Grammars/Languages

C1.1 Introduction formulas (over A ) is inductively defined as follows: Every atom a A is a

Review Languages and Grammars Alphabets, strings, languages Regular Languages

Review Languages and Grammars Alphabets, strings, languages Regular Languages

Review Languages and Grammars CS 301 - Lecture 5 Alphabets, strings, languages Regular

Formal Languages Philippe de Groote 2018-2019 Philippe de Groote Formal Languages 2018-2019 1

Related Reading Chapter 2 Grammars and Parse Trees I Programming Languages Concepts and

Formal Languages, Grammars and Automata Lecture 5 Helle Hvid Hansen helle@cs.ru.nl

Introduction to Computer Science CSCI 109 China Tianhe-2 Readings Andrew Goodney St.

Parsing [S]hell Yann Rgis-Gianas in collaboration with Nicolas Jeannerod and Ralf Treinen

NFA Example Input: a a

Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu

15-411/15-611 Compiler Design Robert Simmons, Instructor Fall

Parsing Principles of Programming Languages Colorado School of Mines https://lambda.mines.edu

LEXING cs4430/7430 Spring 2019 Bill Harrison Announcements "CS4430 Code

Lexical Analysis Aslan Askarov aslan@cs.au.dk acknowledgments: E. Ernst Lexical analysis

Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 - PowerPoint PPT Presentation

Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Outline Languages and grammars Why? Regular languages (for scanning) Context free languages (for parsing) Derivation trees (a.k.a. parse trees) Ambiguity

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

CSC 473 Automata, Grammars &amp; Languages 11/9/10 Automata, Grammars and Languages Discourse 06

CSC 473 Automata, Grammars &amp; Languages 8/15/10 Automata, Grammars and Languages Discourse 01

CSC 473 Automata, Grammars &amp; Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Formal Grammars Why Study Grammars? Whats a Grammar? August 24, 2014 Parsing Brian A.

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Context-Free Grammars and Languages Context-Free Grammars and Languages p.1/40

Outline Languages and Formal Systems BNF Grammars Describing Languages Learning

Parsing: Introduction Context-free Grammars Chomsky hierarchy Type 0 Grammars/Languages

C1.1 Introduction formulas (over A ) is inductively defined as follows: Every atom a A is a

Review Languages and Grammars Alphabets, strings, languages Regular Languages

Review Languages and Grammars Alphabets, strings, languages Regular Languages

Review Languages and Grammars CS 301 - Lecture 5 Alphabets, strings, languages Regular

Formal Languages Philippe de Groote 2018-2019 Philippe de Groote Formal Languages 2018-2019 1

Related Reading Chapter 2 Grammars and Parse Trees I Programming Languages Concepts and

Formal Languages, Grammars and Automata Lecture 5 Helle Hvid Hansen helle@cs.ru.nl

Introduction to Computer Science CSCI 109 China Tianhe-2 Readings Andrew Goodney St.

Parsing [S]hell Yann Rgis-Gianas in collaboration with Nicolas Jeannerod and Ralf Treinen

NFA Example Input: a a

Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu

15-411/15-611 Compiler Design Robert Simmons, Instructor Fall

Parsing Principles of Programming Languages Colorado School of Mines https://lambda.mines.edu

LEXING cs4430/7430 Spring 2019 Bill Harrison Announcements &quot;CS4430 Code

Lexical Analysis Aslan Askarov aslan@cs.au.dk acknowledgments: E. Ernst Lexical analysis

CSC 473 Automata, Grammars & Languages 11/9/10 Automata, Grammars and Languages Discourse 06

CSC 473 Automata, Grammars & Languages 8/15/10 Automata, Grammars and Languages Discourse 01

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

LEXING cs4430/7430 Spring 2019 Bill Harrison Announcements "CS4430 Code