Description Given as grammatical rules States what strings are - - PowerPoint PPT Presentation

description
SMART_READER_LITE
LIVE PREVIEW

Description Given as grammatical rules States what strings are - - PowerPoint PPT Presentation

Syntax Describes the structure of a program Description Given as grammatical rules States what strings are legitimate programs of the given of a language Syntax check (parsing) language Generating a parse tree Does


slide-1
SLIDE 1

78

Description

  • f a

language

  • Syntax

– Describes the structure of a program – Given as grammatical rules – States what strings are legitimate programs of the given language

  • Syntax check (parsing)

– Generating a parse tree – Does the input string correspond to the grammar of the language?

  • Semantics

– What is the meaning a given legal program – What kind of computation does a legal program produce – Static semantics (compile time), "contextual" checking (types, scopes) – Run-time semantics (program operation)

slide-2
SLIDE 2

79

Phases of compilation

  • Compilation usually divided to

separate phases: easier, simpler, clearer

  • Output of previous phase is input of

next

  • Grammar of each phase defines the

programming language

  • Symbol table collects info on user-

defined constructs (variables, functions, types, ...)

slide-3
SLIDE 3

80

Phases of code analysis

  • If applied to natural language:

– lexical analysis: ”Dog cha5es the c? a” – syntax analysis: ”Man blue drive car.” – semantic analysis: ”Car hides moon”

  • Phases of compilation:

type compatability etc. code generation parsing scanning Process contextual semantic

  • syntactic

lexical Analysis

slide-4
SLIDE 4

81

Source code Lexical analysis Syntactic analysis Optimisation Code generation

  • Semantic analysis

tokens target language(assembly, machine)

  • Intermediate codegen

characters

Symbol table Parse tree Intermediate code Executable

(maybe same)

(maybe)

Compilation

slide-5
SLIDE 5

82

Source code Interpreter Inputs Results

Interpretation

Symbol table

slide-6
SLIDE 6

83

Source Lexical analysis Syntax analysis Intermediate code gen. interpreter

  • utput

tokens Parse tree Byte code

Input

Hybrid process

slide-7
SLIDE 7

84

Precompiler

#define MAX_LOOP 100 #define INCR (a) ( a )++ #define FOR_LOOP ( var, from, to ) \\ for ( var = from; var <= to; INCR ( a ) ) { #define END_FOR } #define NULL FOR_LOOP ( n, 1, MAX_LOOP ) NULL; END_FOR; Example on macros

C-code:

slide-8
SLIDE 8

85

Lexical analysis

Lexer

  • grammar: regular
  • format: regular expressions
  • (implementation:

finite state machine) characters (source code) list of tokens

slide-9
SLIDE 9

86

Lexical analysis

  • Grouping of input strings

– lexeme

  • Some item in the input text

– token

  • Classification of lexemes, output of lexical analysis
  • A "name/type" given to a lexeme
  • Tokens have a type (keyword IF, number literal,
  • perator, etc.) and value (IF, 123, +, i.e. interpreted

lexeme)

  • Token

– A terminal symbol in the language syntax, further syntactic structures are built on tokens

index = 2 * count;

Lexeme Token index

identifier: index

=

assignment

2

integer: 2

*

  • mult. operator

count

identifier: count

;

semicolon

slide-10
SLIDE 10

87

Tokens

  • Keywords: Reserved words
  • Identifiers: Names chosen by the

programmer

  • Literals: Values for constants
  • Operators: Arithmetic and similar operations
  • Separators: Symbols and strings that

separate language constructs

  • Other things to consider in lexical analysis:

comments, white space, indentation

  • Grammar for tokens: regular expressions

– [0-9][0-9]* – [A-Za-z][A-Za-z0-9]* – ".*"

slide-11
SLIDE 11

88

Regular expressions ("regexes")

  • "normal" characters: keyword
  • . = any (1) char: k..word
  • [abc] = set of chars: k[aeu]yword
  • [a-z] = range of chars: k[a-e]yword
  • Above can be combined: k[a-e0-9]y
  • * = any number of previous (incl. zero):

ke*yw[o-u]*rd.*

  • () = group chars together: key(word)*
  • ? = 0 or 1 of previous: keyword[0-9]?
  • + = at least 1 of prev: key(word)+
  • | = alternative regexes: key(word|phrase)
slide-12
SLIDE 12

89

program gcd ( input, output ); var i, j: integer; begin read ( i, j ); while i <> j do if i > j then i := i – j else j := j – i; writeln ( i ) end.

Identifying the lexemes

Pascal-code:

Lexical analysis

kw-program, ident(gcd), lparen, ident(input), comma, ident(output), rparen, semicolon, kw-var, ident(i), comma, ident(j), colon, ident(integer), semicolon, kw-begin, ident(read), lparen, ident(i), comma, ident(j), rparen, semicolon, kw-while, ident(i), noteq, ident(j), kw-do, kw-if, ident(i), greater, ident(j), kw-then, ident(i), assign, ident(i), minus, ident(j), kw-else, ident(j), assign, ident(j), minus, ident(i), semicolon, ident(writeln), lparen, ident(i), rparen, kw-end, fullstop