REs, FSMs, Forth, and CFGs Part 2 of 3 Three things today The - - PowerPoint PPT Presentation

res fsms forth and cfgs
SMART_READER_LITE
LIVE PREVIEW

REs, FSMs, Forth, and CFGs Part 2 of 3 Three things today The - - PowerPoint PPT Presentation

REs, FSMs, Forth, and CFGs Part 2 of 3 Three things today The foundations of regular expressions (Dont need to remember details) Introduction to grammars (Important to get concepts) Intro to FORTH (Youll need this for the lab) Regular


slide-1
SLIDE 1

REs, FSMs, Forth, and CFGs

Part 2 of 3

slide-2
SLIDE 2

Three things today

The foundations of regular expressions

(Don’t need to remember details)

Introduction to grammars

(Important to get concepts)

Intro to FORTH

(You’ll need this for the lab)

slide-3
SLIDE 3

Regular expressions have a nice property…

If you give me a regex and a string, I can check if that string matches the regex in linear time

slide-4
SLIDE 4
slide-5
SLIDE 5

Can I cook up a regular expression that will classify any string? (No…)

slide-6
SLIDE 6

If I could, it would imply I could solve any problem in linear time!

slide-7
SLIDE 7

So what’s an example of a regular expression I couldn’t write? “The set of strings P such that P…?”

slide-8
SLIDE 8

So what’s an example of a regular expression I couldn’t write? “The set of strings P such that P…?” (Answer: is a program that halts)

slide-9
SLIDE 9

Regular expressions can be implemented using finite state machines

slide-10
SLIDE 10

We won’t talk too much about FSMs in this class All regexes can “compile” (turn to, in systematic way) FSM

slide-11
SLIDE 11
slide-12
SLIDE 12

Starting state

slide-13
SLIDE 13

Transition on input

slide-14
SLIDE 14

Accepting state (two circles)

slide-15
SLIDE 15

011 S1

slide-16
SLIDE 16

011 S2

slide-17
SLIDE 17

011 S2

Stay!

slide-18
SLIDE 18

011 S2

slide-19
SLIDE 19

011 S2

Reject!

slide-20
SLIDE 20

0110 S1

slide-21
SLIDE 21

0110 S1

Accept!

slide-22
SLIDE 22

(1|01*0)* Note that I got this wrong in class

slide-23
SLIDE 23

“Any number of 1s, followed by an even number of 0s, followed by a single 1”

slide-24
SLIDE 24

1*0(01*0)*1 Note that I got this wrong in class

slide-25
SLIDE 25

Idea: FSMs remember only “one state” of memory It’s kind of like programming with only one register (of unbounded width)

slide-26
SLIDE 26

Theorem: for every regex, a corresponding FSM exists, and vice versa

slide-27
SLIDE 27

Q: Why is this useful? Theoretical A: Bedrock automata theory, useful in proving computational bounds Practical A: Efficient regex implementation

slide-28
SLIDE 28

Motivating CFGs

slide-29
SLIDE 29

{} {{}} {{{}}} {{{{}}}}

Parenthesis are balanced when each left matches a right

slide-30
SLIDE 30

Balancing parentheses necessary to check program syntax (e.g., for C++)

slide-31
SLIDE 31

{*}* doesn’t work

slide-32
SLIDE 32

Turns out: it is impossible to write a regex to capture this fact Instead, we will use context-free grammars

slide-33
SLIDE 33

S -> ε S -> { S }

Here’s a grammar that matches balanced parentheses We’ll talk more about grammars later today and on Friday

slide-34
SLIDE 34
slide-35
SLIDE 35

CFG’s are more expressive than regular expressions, and commensurately more complex to check

slide-36
SLIDE 36

Whereas regular expressions are modeled by finite state machines, CFGs are modeled by state machines that also can push / pop a stack

slide-37
SLIDE 37

But what programming languages can we implement right now

(Without needing to implement CFGs)

slide-38
SLIDE 38
slide-39
SLIDE 39

Forth is a stack-based language

slide-40
SLIDE 40

http://galileo.phys.virginia.edu/classes/551.jvn.fall01/primer.htm

A beginner’s guide to FORTH

slide-41
SLIDE 41

Assembly uses registers and memory, but FORTH uses a stack as its main abstraction

slide-42
SLIDE 42

5

slide-43
SLIDE 43

5 6

slide-44
SLIDE 44

5 6

+

slide-45
SLIDE 45

11

+

slide-46
SLIDE 46

You have already implemented parts of forth

slide-47
SLIDE 47

Each command in forth is called a word

slide-48
SLIDE 48

Words manipulate the stack

slide-49
SLIDE 49

drop

Drops the most recent thing on the stack

( x1 -- )

slide-50
SLIDE 50

swap

( x1 x2 -- x2 x1 )

Top!

slide-51
SLIDE 51

nip

( x1 x2 -- x2 )

slide-52
SLIDE 52

dup

( x1 -- x1 x1)

slide-53
SLIDE 53
  • ver

( x1 x2 —- x1 x2 x1 )

slide-54
SLIDE 54

tuck

( x1 x2 —- x2 x1 x2 )

slide-55
SLIDE 55

You can define your own words (functions)

slide-56
SLIDE 56

: add1 1 + ;

slide-57
SLIDE 57

Adding two Euclidian points

x1 y1 x2 y2 —> (x1 + x2) (y1 + y2)

Want to define addcartesian word, which does this:

1 2 3 4 ok addcartesian ok .s <2> 4 6 ok

slide-58
SLIDE 58

Adding two Euclidian points

x1 y1 x2 y2 —> (x1 + x2) (y1 + y2) x1 y1 x2 y2 —> x1 x2 y2 y1 rot + x1 x2 y2 y1 —> x1 x2 (y1+y2)

What do I do from here?

slide-59
SLIDE 59

Adding two Euclidian points

x1 y1 x2 y2 —> (x1 + x2) (y1 + y2) x1 y1 x2 y2 —> x1 x2 y2 y1 rot + x1 x2 y2 y1 —> x1 x2 (y1+y2) x1 x2 (y1+y2) —> x2 (y1+y2) x1 rot (y1+y2) x1 x2 —> (y1+y2) (x1+x2) + (y1+y2) (x1+x2) -> (x1+x2) (y1+y2) swap x2 (y1+y2) x1 -> (y1+y2) x1 x2 rot

slide-60
SLIDE 60

So that’s forth, we’ll touch a bit more of it Friday And you’ll be implementing part of it in Lab 4

slide-61
SLIDE 61

Back to CFGs!

Why? Because most languages use infix operators

slide-62
SLIDE 62

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

Here’s a context free grammar

slide-63
SLIDE 63

Formally, a grammar is…

  • A set of terminals
  • These are the things you can’t rewrite any further
  • A set of nonterminals
  • These are the things you can rewrite further
  • A set of production rules
  • These are a bunch of rewrite rules
  • A start symbol
slide-64
SLIDE 64

Terminals = {number, +, *}

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

Nonterminals = {Expr} Productions = Start symbol = Expr

slide-65
SLIDE 65

To determine if a grammar matches an expression, you play a game

slide-66
SLIDE 66

First, start with a nonterminal and write that on the page

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

1 + 2

slide-67
SLIDE 67

First, start with a nonterminal and write that on the page

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

1 + 2 Expr

slide-68
SLIDE 68

First, start with a nonterminal and write that on the page

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

1 + 2 Expr To play the game: attempt to apply each production so that you arrive at your full expression

slide-69
SLIDE 69

First, start with a nonterminal and write that on the page

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

1 + 2 Expr -> Expr + Expr

slide-70
SLIDE 70

First, start with a nonterminal and write that on the page

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

1 + 2 Expr

  • > Expr + Expr
  • > number + Expr
  • > number + number
  • > 1 + number
  • > 1 + 2
slide-71
SLIDE 71

First, start with a nonterminal and write that on the page

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

1 + 2 Some moves don’t lead you to winning the game.

slide-72
SLIDE 72

First, start with a nonterminal and write that on the page

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

1 + 2 Some moves don’t lead you to winning the game. Expr

  • > Expr * Expr

???

slide-73
SLIDE 73

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

1 + 2 * 3

Expr

  • > Expr + Expr

Expr

  • > Expr * Expr

This grammar is ambiguous

Exercise: complete the derivations from here We’ll define this more rigorously on Friday

slide-74
SLIDE 74

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

1 + 2 * 3

Expr

  • > Expr + Expr
  • > Expr + Expr * Expr
  • > number + Expr * Expr
  • > number + number * Expr
  • > number + number * number

Expr

  • > Expr * Expr
  • > Expr + Expr * Expr
  • > number + Expr * Expr
  • > number + number * Expr
  • > number + number * number
slide-75
SLIDE 75

if … if … else …

Famous example from C, the “dangling else”

Does the else belong to the first if? Or the second? Most real languages handle these in hacky one-off ways (Ans: in C, the second)

slide-76
SLIDE 76

We can turn a derivation into a parse tree

slide-77
SLIDE 77

Expr

  • > Expr + Expr
  • > number + Expr
  • > number + number
  • > 1 + number
  • > 1 + 2

Expr + Expr Expr Number Number 1 2

slide-78
SLIDE 78

This parse tree is a hierarchical representation of the data A parser is a program that automatically generates a parse tree A parser will generate an abstract syntax tree for the language

slide-79
SLIDE 79

Parsing is hard And also boring But an important problem

slide-80
SLIDE 80

And there are a ton of different parsing algorithms We will learn one fairly useful and easy-to-code one (Recursive descent parsing, or LL(1) parsing)

slide-81
SLIDE 81

(define (parse-input) …)

Expr + Expr Expr Number Number 1 2

1 + 2

Next week, we’ll see how to write these parsers

slide-82
SLIDE 82

Expr

  • > Expr + Expr
  • > Expr + Expr * Expr
  • > number + Expr * Expr
  • > number + number * Expr
  • > number + number * number

Expr

  • > Expr * Expr
  • > Expr + Expr * Expr
  • > number + Expr * Expr
  • > number + number * Expr
  • > number + number * number

Exercise: draw the parse trees for the following derivations

slide-83
SLIDE 83

Here’s an example of a grammar that is not ambiguous

Expr -> MExpr Expr -> MExpr + MExpr MExpr -> MExpr * MExpr MExpr -> number

slide-84
SLIDE 84

Generally, we’re going to want our grammar to be unambiguous

slide-85
SLIDE 85

Question: Why are parse trees useful? Answer: We can use them to define the meaning of programs

slide-86
SLIDE 86

First, can represent parse trees in our PL:

(define my-tree '(+ 1 (* 2 3)))

slide-87
SLIDE 87

(define my-tree '(+ 1 (* 2 3))) (define (evaluate-expr e) (match e [`(+ ,e1 ,e2) (+ (evaluate-expr e1) (evaluate-expr e2))] [`(* ,e1 ,e2) (* (evaluate-expr e2) (evaluate-expr e2))] [else e]))

This allows us to write interpreters

slide-88
SLIDE 88

Next lecture, we’ll dig into grammars even more Our goal is to write parsers, but to do so, we need more intuition about grammars