Grammars and Trees Dr. Vadim Zaytsev aka @grammarware 2015 Recap - - PowerPoint PPT Presentation

grammars and trees
SMART_READER_LITE
LIVE PREVIEW

Grammars and Trees Dr. Vadim Zaytsev aka @grammarware 2015 Recap - - PowerPoint PPT Presentation

Grammars and Trees Dr. Vadim Zaytsev aka @grammarware 2015 Recap Lexical analysis Syntactic analysis Semantic analysis Intermediate representation Code generation Optimisation . . . WHY Formats everywhere


slide-1
SLIDE 1

Grammars and Trees

  • Dr. Vadim Zaytsev aka @grammarware

2015

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Recap

✓ Lexical analysis ✓ Syntactic analysis ✓ Semantic analysis ✓ Intermediate representation ✓ Code generation ✓ Optimisation ✓ . . .

slide-7
SLIDE 7

WHY

✓ Formats everywhere ✓ DSLs are easy ✓ SLs have many faces ✓ 90% automated, 10% hard work

slide-8
SLIDE 8

Models of Languages

✓ How can a language be defined?

slide-9
SLIDE 9

Models of Languages

✓ Actual (in)finite set ✓ {“a”, “b”, “c”} ✓ {0ⁱ1ⁿ…} ✓ English ✓ set arithmetic works ✓ concatenation, union, difference, intersection, complement, closure

slide-10
SLIDE 10

Models of Languages

✓ Formal grammar ✓ term rewriting system ✓ “semi-Thue” ✓ all about rewriting rules ✓ α → β

slide-11
SLIDE 11

Models of Languages

✓ Recognising automaton ✓ states ✓ transitions ✓ extra stuff

slide-12
SLIDE 12

Models of Languages

✓ Declarative ✓ enumeration / description ✓ characteristic function ✓ Analytic ✓ recogniser / parser ✓ analytic grammar ✓ Generative ✓ term rewriting system ✓ generative grammar

slide-13
SLIDE 13

Program Language

instance of

slide-14
SLIDE 14

Sentences Grammar Automaton Program Language

m

  • d

e l l e d b y m

  • d

e l l e d b y modelled by

slide-15
SLIDE 15

Sentences Grammar Automaton Program

generates accepts

Language

m

  • d

e l l e d b y m

  • d

e l l e d b y modelled by

slide-16
SLIDE 16

Sentences Grammar

element of

Automaton Program

generates accepts

Language

c

  • n

f

  • r

m s t

  • p

a r s e a b l e b y m

  • d

e l l e d b y m

  • d

e l l e d b y modelled by

slide-17
SLIDE 17

Language Program Grammar

defined by conforms to

Program Grammar

conforms to defined by

slide-18
SLIDE 18

Language Program Grammar

defined by conforms to

Program Grammar

conforms to defined by defined by

slide-19
SLIDE 19

Example: XML

✓ X ::= ![<>]+ | '<' ![>]+ '>' X* '<' '/' ![>]+ '>' ✓ X ::= D | '<' T A* '>' X* '<' '/' T '>' ✓ <!ELEMENT dir (#PCDATA)> <!ATTLIST dir xml:space (def|preserve) 'preserve'> ✓ <xsd:element name="tag"> <xsd:complexType> . . .

slide-20
SLIDE 20

Conclusion

✓ “Language” is intangible ✓ Grammars hide in: ✓ data types ✓ API and libraries ✓ protocols and formats ✓ structural commitments ✓ . . . ✓ Not all grammars are equally “good”

slide-21
SLIDE 21

Rose by Arwen Grune; p.58 of Grune/Jacobs’ “Parsing Techniques”, 2008

slide-22
SLIDE 22

Unrestricted grammars Context-sensitive grammars Context-free grammars Regular grammars

α → β X → a X → a B α X β → α γ β X → γ

Duncan Rawlinson, Chomsky.jpg, 2004, CC-BY.

Noam Chomsky. On Certain Formal Properties of Grammars, Information & Control 2(2):137–167, 1959.

Noam Chomsky (b.1928)

slide-23
SLIDE 23

Unrestricted grammars Decidable grammars Context-sensitive grammars Indexed grammars Context-free grammars Deterministic CFG Nested word Regular grammars Non-recursive grammars

α → β X → a X → a B α X β → α γ β X → γ

Duncan Rawlinson, Chomsky.jpg, 2004, CC-BY.

Noam Chomsky. On Certain Formal Properties of Grammars, Information & Control 2(2):137–167, 1959.

A [ σ ] → α [ σ ] A [ σ ] → B [ f σ ] A [ f σ ] → α [ σ ]

Noam Chomsky (b.1928)

slide-24
SLIDE 24

Unrestricted grammars Recursively enumerable languages Turing machine Decidable grammars Recursive languages Terminating automata Context-sensitive grammars Context-sensitive languages Linear-bounded automata Indexed grammars Languages with macros Nested stack automata Context-free grammars Context-free languages Pushdown automata Deterministic CFG Deterministic CFL Deterministic PDA Nested word Nested word Visibly PDA Regular grammars Regular languages FSMs Non-recursive grammars Finite languages FSMs without cycles

slide-25
SLIDE 25

Finite languages

✓ Examples: ✓ Boolean values ✓ languages ✓ countries ✓ cities ✓ postcodes

slide-26
SLIDE 26

Regular languages

✓ Regular sets by Stephen Kleene in 1956 ✓ ∅, ε, letters from Σ ✓ concatenation ✓ iteration ✓ alternation ✓ Precisely fit the regular class

Stephen Cole Kleene (1909–1994)

  • S. C. Kleene, Representation of Events in Nerve Nets and Finite Automata. In Automata Studies, pp. 3–42, 1956.

photo from: Konrad Jacobs, S. C. Kleene, 1978, MFO.

slide-27
SLIDE 27

Regular languages

✓ PCRE ✓ “Perl-compatible
 regular expressions” ✓ (not compatible with Perl) ✓ (not regular) ✓ C library ✓ (backrefs, recursion, assertions…)

slide-28
SLIDE 28

Context-free

✓ FSM + memory (stack) ✓ Modular composition ✓ A ::= “[” B “]” ; ✓ B ::= A? ; ✓ Forget intersection & diff ✓ Closed under substitution

John Backus (1924–2007)

slide-29
SLIDE 29

Context-sensitive

✓ Explainable only in context ✓ Sentence → List End ✓ List → Name; ✓ List → List “,” Name; ✓ “,” Name End → “and” Name ✓ Parsing in exponential time

slide-30
SLIDE 30

Unbounded

✓ (almost) anything ✓ recognising is impossible ✓ parsing is impossible

slide-31
SLIDE 31

Which is which?

✓ Substring search ✓ grep, contains(), find(), substring(), … ✓ Substring replacement ✓ sed, awk, perl, vim, replace(), replaceAll(), … ✓ Pretty-printing ✓ VS.NET, Sublime, TextMate, …

slide-32
SLIDE 32

Which is which?

✓ Counting [non-empty] lines in a file ✓ wc -l, grep -c “” ✓ grep -v “^$”, sed -n /./p | wc -l ✓ Parsing HTML ✓ <BODY><TABLE><P><A HREF=… ✓ Parsing a postcode ✓ 1098 XG, …

slide-33
SLIDE 33

Popular languages

✓ {aⁱbⁿ…} ✓ 0 counters ✓ 1 counter ✓ n counters ✓ ∞ counters ✓ Dyck language ✓ parentheses ✓ named parentheses

Walther von Dyck (1856–1934)

Zeitlupe, https://en.wikipedia.org/wiki/File:Grabstaette_Walther_von_Dyck.jpg, CC-BY-SA, 2012

slide-34
SLIDE 34

Popular parsers

✓ Bottom-up

✓ Reduce the input back to the start symbol ✓ Recognise terminals ✓ Replace terminals by nonterminals ✓ Replace terminals and nonterminals by left-hand side of rule

✓ LR, LR(0), LR(1), LR(k), LALR, SLR, GLR, SGLR, CYK, … ✓ Top-down

✓ Imitate the production process by rederivation ✓ Each nonterminal is a goal ✓ Replace each goal by subgoals (= elements of its rule) ✓ Parse tree is built from top to bottom

✓ LL, LL(1), LL(k), LL(*), GLL, DCG, RD, Packrat, Earley

slide-35
SLIDE 35

Popular parsers

✓ Bottom-up

✓ Reduce the input back to the start symbol ✓ Recognise terminals ✓ Replace terminals by nonterminals ✓ Replace terminals and nonterminals by left-hand side of rule

✓ LR, LR(0), LR(1), LR(k), LALR, SLR, GLR, SGLR, CYK, … ✓ Top-down

✓ Imitate the production process by rederivation ✓ Each nonterminal is a goal ✓ Replace each goal by subgoals (= elements of its rule) ✓ Parse tree is built from top to bottom

✓ LL, LL(1), LL(k), LL(*), GLL, DCG, RD, Packrat, Earley

YACC / bison Beaver SableCC GDK Tom ASF+SDF Spoofax JavaCC ANTLR ModelCC Rascal TXL Rats! PetitParser

slide-36
SLIDE 36

Popular data structures

✓ Lists (of tokens) ✓ Trees (hierarchy!) ✓ Forests (many trees) ✓ Graphs (loops!) ✓ Relations (tables)

slide-37
SLIDE 37

Conclusion

✓ Parsing recognises structure ✓ Can be many models of a language ✓ Hierarchy of classes ✓ 90% automated, 10% hard work

slide-38
SLIDE 38
slide-39
SLIDE 39

✓ Terminal symbols ✓ finite sublanguage ✓ regular sublanguage ✓ Keywords ✓ Layout ✓ whitespace ✓ comments

Lexical syntax

slide-40
SLIDE 40

✓ Terminal symbols ✓ finite sublanguage ✓ regular sublanguage ✓ Keywords ✓ Layout ✓ whitespace ✓ comments

layout L = (WS|Cm)*
 !>> [\ \t\n\r] !>> "--";

Lexical syntax

lexical Boolean = "True" | "False"; lexical Id = [a-z]+ !>> [a-z]; keyword Reserved = "if" | "while"; lexical Id = [a-z]+ \ Reserved !>> [a-z]; lexical WS = [\ \t\n\r]; lexical Cm = "--" ... $;

slide-41
SLIDE 41

layout L = [\ \t\n\r]* !>> [\ \t\n\r]; lexical D = ![\<\>]* !>> ![\<\>]; lexical T = [a-z][a-z0-9]* !>> [a-z0-9]; lexical A = [a-z]+ [=] [\"] ![\"]* [\"]; lexical X = D | "\<" T A* "\>" X+ "\<" "/" T "\>";

Lexical syntax XML

slide-42
SLIDE 42

layout L = [\ \t\n\r]* !>> [\ \t\n\r]; lexical D = ![\<\>]* !>> ![\<\>]; lexical T = [a-z][a-z0-9]* !>> [a-z0-9]; lexical A = [a-z]+ [=] [\"] ![\"]* [\"]; lexical X = D | "\<" T L {A L}* "\>" X+ "\<" "/" T "\>";

Beyond lexical XML

slide-43
SLIDE 43

layout L = [\ \t\n\r]* !>> [\ \t\n\r]; lexical D = ![\<\>]* !>> ![\<\>]; lexical T = [a-z][a-z0-9]* !>> [a-z0-9]; lexical A = [a-z]+ [=] [\"] ![\"]* [\"]; lexical X = D | "\<" T L {A L}* "\>" X+ "\<" "/" T "\>";

Beyond lexical XML

lexical → syntax

slide-44
SLIDE 44

layout L = [\ \t\n\r]* !>> [\ \t\n\r]; syntax D = W+; lexical W = ![\ \t\n\r\<\>]+ !>> ![\ \t\n\r\<\>]; lexical T = [a-z][a-z0-9]* !>> [a-z0-9]; lexical A = [a-z]+ [=] [\"] ![\"]* [\"]; syntax X = D | "\<" T A* "\>" X* "\<" "/" T "\>";

Beyond lexical XML

slide-45
SLIDE 45

✓ Terminal: "if" ✓ Character class: [a-z] ✓ Inverse: ![a-z] ✓ Kleene closures: [a-z]+, [a-z]* ✓ Optionals: [a-z]? ✓ Reserve: [a-z]+ \ Keywords ✓ Follow: [a-z]+ !>> [a-z]

Recap: lexical

slide-46
SLIDE 46

✓ Choice: | ✓ Priority: > ✓ Associativity: left, right, non-assoc ✓ Named alternatives: foo: x ✓ Named symbols: E left "+" E right ✓ Regular combinators: X*, X+, X?

Beyond lexical

slide-47
SLIDE 47

✓ parse(#N, s) ✓ try parse(#N, s) catch: . . . ✓ vis::ParseTree::renderParsetree(t) ✓ /amb(_) !:= t ✓ t is foo ✓ t.x ✓ if (pattern := tree) . . . ✓ (E)`<E e1> + <E e2>` ✓ /regexp/

Useful