Syntax and ANTLR Syntax vs. Semantics Semantics: What does a - - PowerPoint PPT Presentation

syntax and antlr syntax vs semantics
SMART_READER_LITE
LIVE PREVIEW

Syntax and ANTLR Syntax vs. Semantics Semantics: What does a - - PowerPoint PPT Presentation

CS152 Programming Language Paradigms Prof. Tom Austin Syntax and ANTLR Syntax vs. Semantics Semantics: What does a program mean? Defined by an interpreter or compiler? Syntax: How is a program structured? Defined by a


slide-1
SLIDE 1

CS152 – Programming Language Paradigms

  • Prof. Tom Austin

Syntax and ANTLR

slide-2
SLIDE 2

Syntax vs. Semantics

  • Semantics:

– What does a program mean? – Defined by an interpreter or compiler?

  • Syntax:

– How is a program structured? – Defined by a lexer and parser

slide-3
SLIDE 3

Review: Overview of Compilation

Lexer/ Tokenizer Parser

source code tokens

Abstract Syntax Tree (AST) Compiler

Machine code

Interpreter

Commands

slide-4
SLIDE 4

Tokenization

Lexer/ Tokenizer Parser

source code tokens

Abstract Syntax Tree (AST) Compiler

Machine code

Interpreter

Commands

slide-5
SLIDE 5

Tokenization

  • Process of converting characters to the words
  • f the language.
  • Generally handled through regular expressions.
  • A variety of lexers exist:

– Lex/Flex are old and well-established – ANTLR & JavaCC both handle lexing and parsing

  • Sample lexing rule for integers (in Antlr)

INT : [0-9]+ ;

slide-6
SLIDE 6

Categories of Tokens

  • Reserved words or keywords

– e.g. if, while

  • Literals or constants

– e.g. 123, "hello"

  • Special symbols

– e.g. ";", "<=", "+"

  • Identifiers

– e.g. balance, tyrionLannister

slide-7
SLIDE 7

Lexing in ANTLR (v. 4)

(in class)

slide-8
SLIDE 8

Parsing

Lexer/ Tokenizer Parser

source code tokens

Abstract Syntax Tree (AST) Compiler

Machine code

Interpreter

Commands

slide-9
SLIDE 9

Parsing

  • Parsers take the tokens of the language and

combines them into abstract syntax trees (ASTs).

  • The rules for parsers are defined by context

free grammars (CFGs).

  • Parsers can be divided into

– bottom-up/shift-reduce parsers – top-down parsers

slide-10
SLIDE 10

Context Free Grammars

  • Grammars specify a language
  • Backus-Naur form is a common format

Expr -> Number | Number + Expr

  • Terminals cannot be broken down further.
  • Non-terminals can be broken down into further

phrases.

slide-11
SLIDE 11

Sample grammar

expr -> expr + expr | expr – expr | ( expr ) | number number -> number digit | digit digit -> 0 | 1 | 2 | 3 | … | 9

slide-12
SLIDE 12

Bottom-up Parsers

  • Also known as shift-reduce parsers

– shift tokens onto a stack, then reduce to a non- terminal.

  • LR: left-to-right, rightmost derivation
  • The most common type of bottom-up parsers

are Look-Ahead LR parsers (LALR)

– YACC/Bison are examples

  • Generally considered to be more powerful,

though they seem to be fading from popularity.

slide-13
SLIDE 13

Top-down parsers

  • Non-terminals are expanded to match incoming

tokens.

  • LL: left-to-right, leftmost derivation
  • LL(k) parsers can look ahead k elements to decide

which rule to use.

– example LL(k) parser: JavaCC

  • LL(1) parsers (known as recursive descent)

parsers are of special interest:

– Easy to write/fast execution time – Some languages are designed to be LL(1)

slide-14
SLIDE 14

Antlr

  • Antlr v. 1-3 were LL(*)

– Similar to LL(k), but can look ahead as far as needed.

  • Antlr v. 4 is Adaptive LL(*), or ALL(*)

– Allows us to write left-recursive grammars that were not previously possible with LL parsers. http://www.antlr.org/papers/allstar-techreport.pdf – Sample left-recursive grammar: expr -> expr + expr | num

slide-15
SLIDE 15

Parsing with ANTLR

(in-class)

slide-16
SLIDE 16

Lab: Getting to know Antlr

Write a calculator using Antlr. Details in Canvas, starter code on course website.