Book Software Language Engineering g g g g Generic Language - - PowerPoint PPT Presentation

book
SMART_READER_LITE
LIVE PREVIEW

Book Software Language Engineering g g g g Generic Language - - PowerPoint PPT Presentation

Book Software Language Engineering g g g g Generic Language Technology (2IS15) g g gy ( ) by Anneke Kleppe (Addison Wesley) Syntaxes Prof.dr. Mark van den Brand / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 1 Signatures and


slide-1
SLIDE 1

Generic Language Technology (2IS15) g g gy ( )

Syntaxes

Prof.dr. Mark van den Brand

Book

  • Software Language Engineering

g g g g by Anneke Kleppe (Addison Wesley)

/ Faculteit Wiskunde en Informatica

PAGE 1 13-9-2011

Signatures and grammars

  • Definition of a (programming) language involves:

( g g) g g

  • abstract syntax, so-called signature
  • concrete syntax:

− textual syntax textual syntax − graphical syntax

  • semantics:

− static semantics − static semantics − dynamic semantics

/ Faculteit Wiskunde en Informatica

PAGE 2 13-9-2011

Signatures and grammars

Grammar world Grammar world

  • The 4-layer architecture
  • M3 (E)BNF/SDF grammar

− defines structure of the (E)BNF in (E)BNF

  • M2 Java grammar

− defines the structure of Java in (E)BNF ( )

  • M1 Java program

− describes the manipulation (algorithm) of objects in the object layer

  • M0 Object layer

− Objects we wish to manipulate

/ Faculteit Wiskunde en Informatica

PAGE 3 13-9-2011

slide-2
SLIDE 2

Signatures and grammars

  • Abstract syntax:

y

  • defines basic structure
  • f the language

( k l t ) (skeleton)

  • is starting point for

defining: defining:

− concrete syntax − static semantics d i ti − dynamic semantics

/ Faculteit Wiskunde en Informatica

PAGE 4 13-9-2011

Signatures and grammars

  • Abstract syntax is a

Abstract syntax definition of Booleans:

y collection of constructors/-

“true”() -> BoolCon “false”() -> BoolCon “con”(BoolCon)

  • > Bool

functions

  • No information about

k d i iti

con (BoolCon) > Bool “and”(Bool, Bool) -> Bool “or”(Bool, Bool) -> Bool “not”(Bool) -> Bool

keywords, priorities, associativities, etc.

nonterminal nonterminal constructor

/ Faculteit Wiskunde en Informatica

PAGE 5 13-9-2011

Signatures and grammars

  • There is no standardized way of defining abstract

y g syntax

  • SSL (specification formalism of the Synthesizer Generator)
  • Signature-like
  • Signature-like
  • (Meta-modeling)

/ Faculteit Wiskunde en Informatica

PAGE 6 13-9-2011

Signatures and grammars

  • SSL (grammar specification formalism of the

(g Synthesizer Generator) describes it as follows:

  • A collection of rules that define phyla and operators
  • A phylum is a nonempty set of terms
  • A phylum is a nonempty set of terms
  • A term is the application of a k-ary operator to k terms of the

appropriate phylum A k ary operator is a constructor function mapping k terms to

  • A k-ary operator is a constructor function mapping k terms to

a term

  • A phylum can be considered a nonterminal

phyl0 : op(phyl1 phyl2 … phylk)

/ Faculteit Wiskunde en Informatica

PAGE 7 13-9-2011

slide-3
SLIDE 3

Signatures and grammars

  • SSL notation of the definition of the abstract syntax

y

  • f Booleans:

boolcon : True() | False() boolcon : True() | False() bool : Con(boolcon) | And(bool bool) | Or(bool bool)

/ Faculteit Wiskunde en Informatica

PAGE 8 13-9-2011

Signatures and grammars

  • Signature describes it as follows:

g

  • A collection of functions that define sorts and operators
  • A sort represents a nonempty set of terms
  • A term is the application of a k-ary operator to k terms of the
  • A term is the application of a k-ary operator to k terms of the

appropriate sort

  • A k-ary operator is a constructor function mapping k terms to

a term a term

  • A sort can be considered a nonterminal

( )

  • p(sort1, sort2, …, sortk)  sort0

/ Faculteit Wiskunde en Informatica

PAGE 9 13-9-2011

Signatures and grammars

  • Signature notation of the definition of the abstract

g syntax of Booleans:

“true”()

  • > BoolCon

true () > BoolCon “false”() -> BoolCon “con”(BoolCon) -> Bool “and”(Bool, Bool) -> Bool “or”(Bool, Bool) -> Bool “not”(Bool) -> Bool

/ Faculteit Wiskunde en Informatica

PAGE 10 13-9-2011

Signatures and grammars

  • Given signatures it is possible to generate APIs

g g

  • Tooling for defining signatures and generating APIs:
  • GOM part of TOM

(http://tom loria fr/wiki/index php5/Documentation:Gom) (http://tom.loria.fr/wiki/index.php5/Documentation:Gom)

  • ApiGen part of SDF (see later)

/ Faculteit Wiskunde en Informatica

PAGE 11 13-9-2011

slide-4
SLIDE 4

Signatures and grammars

  • Definition of a (programming) language involves:
  • lexical syntax, so-called tokens:

− identifiers, numbers, strings, “if”, “then”, “class” (keywords)

  • context-free syntax, so-called production rules:

− Statement ::= “if” Expression “then” Statements “else” Statements “fi”

  • static semantics:

− identification and scope resolution type checking − type checking

  • dynamic semantics:

− operational semantics − interpretation interpretation − compilation

/ Faculteit Wiskunde en Informatica

PAGE 12 13-9-2011

Signatures and grammars

  • Goal: defining languages & manipulating programs

g g g g g

  • SDF: Syntax definition Formalism
  • lexical & context-free syntax

ASF SDF M t E i t IDE f ASF SDF

  • ASF+SDF Meta-Environment: IDE for ASF+SDF
  • manuals/documentation: www.meta-environment.org
  • Spoofax/IMP: Eclipse plugin for SDF

Spoofax/IMP: Eclipse plugin for SDF

  • manuals/documentation: http://strategoxt.org/Spoofax

/ Faculteit Wiskunde en Informatica

PAGE 13 13-9-2011

Signatures and grammars

  • Anatomy of SDF specifications

y

module C ... module A ... im ts B C imports B C ... module D module B ... ... imports D ...

/ Faculteit Wiskunde en Informatica

PAGE 14 13-9-2011

Signatures and grammars

  • Anatomy of an SDF module

y

Name of this module; may be followed by parameters Names of modules imported by this module;

module ModuleName ImportSection* E tO Hidd S ti *

p y ; May be followed by renamings Grammar elements that are visible from the

ExportOrHiddenSection*

Grammar elements that are visible from the

  • utside (exports) or only inside the module

(hiddens). imports, aliases, sorts, lexical syntax, context-free syntax,

/ Faculteit Wiskunde en Informatica

PAGE 15 13-9-2011

y priorities, variables

slide-5
SLIDE 5

Signatures and grammars

  • SDF by examples

y

  • Boolean language
  • Pico language

/ Faculteit Wiskunde en Informatica

PAGE 16 13-9-2011

Signatures and grammars

Boolean Constants

module basic/BoolCon

Sort of Boolean constants Sorts should always start with a capital letter

exports sorts BoolCon context-free syntax " " B l { (“ ”)}

The constants true and false, literals should always be quoted p

"true" -> BoolCon {cons(“true”)} "false" -> BoolCon {cons(“false”)}

/ Faculteit Wiskunde en Informatica

PAGE 17 13-9-2011

Signatures and grammars

Booleans

module basic/Booleans imports basic/BoolCon exports

Import Boolean constants

exports sorts Boolean context-free syntax BoolCon -> Boolean {cons(“con”)}

The sort of Boolean expressions

BoolCon > Boolean {cons( con )}

Each Boolean constant is a Boolean Expression, also called injection rule h i l

  • r chain rule

/ Faculteit Wiskunde en Informatica

PAGE 18 13-9-2011

Signatures and grammars

The infix operators and & and or |. Both are left associative (left)

B l n "|" B l n > B l n {c ns(“ r”) l ft}

Both are left-associative (left) The prefix function not

Boolean "|" Boolean -> Boolean {cons( or”), left} Boolean "&" Boolean -> Boolean {cons(“and”), left} “not” (Boolean) -> Boolean {cons(“not”)} "(" Boolean ")" > Boolean {bracket}

( and ) may be used as brackets in

( Boolean ) -> Boolean {bracket} context-free priorities Boolean "&" Boolean -> Boolean >

( and ) may be used as brackets in Boolean expressions; they are ignored after parsing & h hi h i it th |

Boolean & Boolean -> Boolean > Boolean "|" Boolean -> Boolean

& has higher priority than | Example: Bool & Bool | Bool

/ Faculteit Wiskunde en Informatica

PAGE 19 13-9-2011

is interpreted as: (Bool & Bool) | Bool

slide-6
SLIDE 6

Signatures and grammars

The start symbol of a grammar.

hiddens context-free start-symbols B l

y g Without a start symbol the parser does not know how to start parsing an input sentence

Boolean imports b i /C t

Import the standard comments

basic/Comments

Import the standard comments

/ Faculteit Wiskunde en Informatica

PAGE 20 13-9-2011

Signatures and grammars

  • Summary:

y

  • Each module defines a language; in this case the language of

Booleans (synonym: data type)

  • We can use this language definition to

− Create a (syntax-directed) editor for the Boolean language and create Boolean terms create Boolean terms − Import it in another module; this makes the Boolean language available for the importing module

/ Faculteit Wiskunde en Informatica

PAGE 21 13-9-2011

Signatures and grammars

  • A toy language Pico:

y g g

  • Pico has two types: natural number and string
  • Variables have to be declared
  • Statements: assign if-then-else while-do
  • Statements: assign, if-then-else, while-do
  • Expressions: natural, string, +, - and ||
  • + and - have natural operands; the result is natural
  • || has string operands and the result is string
  • Tests (if, while) should be of type natural

/ Faculteit Wiskunde en Informatica

PAGE 22 13-9-2011

Signatures and grammars

begin declare input : natural,

input value

beg n declare nput natural,

  • utput : natural,

repnr : natural, rep : natural; 14

  • utput value

input := 14;

  • utput := 1;

while input - 1 do rep := output;

What does this program compute?

rep : output; repnr := input; while repnr - 1 do

  • utput := output + rep;

repnr := repnr - 1

  • d;

input := input - 1

  • d

/ Faculteit Wiskunde en Informatica

PAGE 23 13-9-2011

  • d

end

slide-7
SLIDE 7

Signatures and grammars

begin declare input : natural,

input value

beg n declare nput natural,

  • utput : natural,

repnr : natural, rep : natural; 14

  • utput value

h d hi

input := 14;

  • utput := 1;

while input - 1 do rep := output;

What does this program compute? 14! = 14 * 13 * ... * 1

rep : output; repnr := input; while repnr - 1 do

  • utput := output + rep;

Why is it written in this clumsy style?

repnr := repnr - 1

  • d;

input := input - 1

  • d

(a) Pico has no input/output statements (b) Pico has no multiplication

/ Faculteit Wiskunde en Informatica

PAGE 24 13-9-2011

  • d

end

(b) Pico has no multiplication

  • perator

Signatures and grammars

  • Defining the syntax for Pico

g y

basic/NatCon languages/pico/syntax/Types basic/StrCon languages/pico/syntax/Pico languages/pico/syntax/Pico languages/pico/syntax/Identifiers basic/Whitespace

/ Faculteit Wiskunde en Informatica

PAGE 25 13-9-2011

Signatures and grammars

module languages/pico/syntax/Pico

Sorts and s nta r les for

imports languages/pico/syntax/Identifiers languages/pico/syntax/Types basic/NatCon

Sorts and syntax rules for program and declarations

basic/NatCon basic/StrCon exports sorts

List of zero or more statements separated by “;”

PROGRAM DECLS ID-TYPE STATEMENT EXP context-free start-symbols PROGRAM context free syntax

statements separated by ;

*

zero or more

+

  • ne or more

context-free syntax "begin" DECLS {STATEMENT ";"}* "end" -> PROGRAM {cons(“program”)} "declare" {ID-TYPE ","}* ";"

  • > DECLS {cons(“decls”)}

PICO-ID ":" TYPE

  • > ID-TYPE {cons(“id-type”)}

/ Faculteit Wiskunde en Informatica

PAGE 26 13-9-2011

Signatures and grammars

PICO ID " " EXP STATEMENT { (“ i ”)}

Syntax rules for statemen

PICO-ID ":=" EXP

  • > STATEMENT {cons(“assign”)}

"if" EXP "then" {STATEMENT ";"}* "else" {STATEMENT ";"}* "fi"

  • > STATEMENT {cons(“cond”)}

"while" EXP "do" {STATEMENT ";"}* "od“ -> STATEMENT {cons(“loop”)} wh le EX do {S EMEN ; } od S EMEN {cons( loop )}

/ Faculteit Wiskunde en Informatica

PAGE 27 13-9-2011

slide-8
SLIDE 8

Signatures and grammars

Syntax rules for expressions

PICO-ID

  • > EXP {cons(“id”)}

NatCon

  • > EXP {cons(“nat”)}

StrCon

  • > EXP {cons(“str”)}

Syntax rules for expressions The sort NatCon is imported from basic/NatCon

EXP "+" EXP

  • > EXP {cons(“plus”), left}

EXP "-" EXP

  • > EXP {cons(“min”), left}

EXP "||" EXP

  • > EXP {cons(“conc”), left}

"(" EXP ")“

  • > EXP {bracket}

The sort StrCon is imported from basic/StrCon

context-free priorities EXP "||" EXP -> EXP > EXP "-" EXP -> EXP >

Binary operators are left-associative

EXP EXP > EXP > EXP "+" EXP -> EXP

The priorities of the binary

  • perators, a disambiguation

/ Faculteit Wiskunde en Informatica

PAGE 28 13-9-2011

  • perators, a disambiguation

construct: 1 - (2 + 3), or (1 - 2) + 3

Signatures and grammars

  • Lexical syntax: Identifiers

y

module languages/pico/syntax/Identifiers exports sorts PICO-ID

Repeat zero (*) or one (+) or more times

lexical syntax [a-z] [a-z0-9]* -> PICO-ID l l

(+) or more times

lexical restrictions PICO-ID -/- [a-z0-9]

A character class: PICO-ID Starts with a lowercase letter A lexical restriction: is aaa three, two or one identifier?

  • /- can be used to define longest match

/ Faculteit Wiskunde en Informatica

PAGE 29 13-9-2011

Signatures and grammars

  • Pico-Types

y

module languages/pico/syntax/Types exports

The sort of possible types in a Pico program

p sorts TYPE context-free syntax "natural" -> TYPE {cons(“natural”)} " t i " TYPE { (“ t i ”)}

The constants natural and string represent types as can be declared in Pico program

"string" -> TYPE {cons(“string”)}

p g

/ Faculteit Wiskunde en Informatica

PAGE 30 13-9-2011

Signatures and grammars

  • Summary
  • The modules languages/pico/syntax/Pico defines (together

with the imported modules) the syntax for the Pico language

  • This syntax can be used to
  • This syntax can be used to

− Generate a parser that can parse Pico programs − Generate a syntax-directed editor for Pico programs

/ Faculteit Wiskunde en Informatica

PAGE 31 13-9-2011

slide-9
SLIDE 9

Signatures and grammars

  • An elementary symbol is:

y y

  • Literal: “abc”
  • Sort (non-terminal) names: INT
  • Character classes: [a-z]: one of a, b, ..., z

− ~: complement of character class. − /: difference of two character classes /: difference of two character classes. − /\: intersection of two character classes. − \/: union of two character classes.

/ Faculteit Wiskunde en Informatica

PAGE 32 13-9-2011

Signatures and grammars

A complex symbol is: y

  • Repetition:

− S* zero or more times S; S+ one or more times S {S1 S2}*

ti S1 t d b S2

− {S1 S2}* zero or more times S1 separated by S2 − {S1 S2}+ one or more times S1 separated by S2

  • Optional: S? zero or one occurrences of S

Optional: S? zero or one occurrences of S

  • Alternative: S | T an S or a T
  • Tuple: <S,T> shorthand for “<” S “,” T “>”
  • Parameterized sorts: S[[ P1, P2 ]]

/ Faculteit Wiskunde en Informatica

PAGE 33 13-9-2011

Signatures and grammars

  • Productions (functions):

( )

  • General form of a production (function):

− S1 S2 ... Sn -> S0 Attributes

  • Lexical syntax and context free syntax are similar but
  • Lexical syntax and context-free syntax are similar, but

− Between the symbols in a production optional layout symbols may occur in the input text. A t t f d ti i i l t ith − A context-free production is equivalent with: − S1 LAYOUT? S2 LAYOUT? ... LAYOUT? Sn -> S0

/ Faculteit Wiskunde en Informatica

PAGE 34 13-9-2011

Signatures and grammars

  • Floating point numbers

g

sorts UnsignedInt SignedInt UnsignedReal Number lexical syntax [0] | ([1-9][0-9]*) -> UnsignedInt [0] | ([1-9][0-9] ) -> UnsignedInt [\+\-]? UnsignedInt

  • > SignedInt

UnsignedInt "." [0-9]+ ([eE] SignedInt)? -> UnsignedReal UnsignedInt [eE] SignedInt

  • > UnsignedReal

UnsignedInt | UnsignedReal

  • > Number

0 1 14 0 1 3e4 3 014e 7 00 01 04 1 3e04 3 14e 07

/ Faculteit Wiskunde en Informatica

PAGE 35 13-9-2011

0 1 14 0.1 3e4 3.014e-7 00 01 04.1 3e04 3.14e-07

slide-10
SLIDE 10

Signatures and grammars

  • Various ways of constructing lists

y g

A+ a a a a

Assume: “a” -> A

{A “;”}+

a ; a a ; a; a a ; a; a;

a a (A “;”)+ (A “;”?)+ a ; a ;

a ; a;

;

a ; a; a;

a ; a; a a ; a; a (A ; ?) a a a a ; a; a ; a

/ Faculteit Wiskunde en Informatica

PAGE 36 13-9-2011