Nested Word Automata Jens Stimpfle 30.6.2014 Nested Words Nested - - PowerPoint PPT Presentation

nested word automata
SMART_READER_LITE
LIVE PREVIEW

Nested Word Automata Jens Stimpfle 30.6.2014 Nested Words Nested - - PowerPoint PPT Presentation

Nested Word Automata Jens Stimpfle 30.6.2014 Nested Words Nested Words Theoretically and practically pleasant model for the representation of data with both: a linear ordering a hierarchically nested matching Nested Words


slide-1
SLIDE 1

Nested Word Automata

Jens Stimpfle 30.6.2014

slide-2
SLIDE 2

Nested Words

slide-3
SLIDE 3

Nested Words

◮ Theoretically and practically pleasant model for the

representation of data with both:

◮ a linear ordering ◮ a hierarchically nested matching

slide-4
SLIDE 4

Nested Words

◮ Theoretically and practically pleasant model for the

representation of data with both:

◮ a linear ordering ◮ a hierarchically nested matching

◮ Applications in software verification and document processing

slide-5
SLIDE 5

Nested Words

◮ Theoretically and practically pleasant model for the

representation of data with both:

◮ a linear ordering ◮ a hierarchically nested matching

◮ Applications in software verification and document processing ◮ This is the last list item

slide-6
SLIDE 6

Structure of this talk

  • 1. Motivation
  • 2. Nested words
  • 3. Nested word automata
slide-7
SLIDE 7

Section 1 Motivation

slide-8
SLIDE 8

Subsection 1 Data with both linear ordering and hierarchically nested matching

  • 1. Document trees (e.g. HTML)
  • 2. Executions of structured programs (with call-return semantics)
slide-9
SLIDE 9

Document trees (e.g. HTML)

"Hello" "Hello, World!" h1 p "Hello" title head body html

slide-10
SLIDE 10

Executions of structured programs (with call-return semantics)

main() countToZero(1) countToZero(0) printLn("1") printLn("0")

slide-11
SLIDE 11

Subsection 2 Formal Languages

◮ Regular Languages ◮ Context-Free Languages

slide-12
SLIDE 12

Regular Languages

Regular language over an alphabet Σ

◮ Most easily explained as generated by a regular expression

(RE)

◮ Example RE: 0|[123456789][0123456789]*

slide-13
SLIDE 13

Regular Languages

Regular language over an alphabet Σ

◮ Most easily explained as generated by a regular expression

(RE)

◮ Example RE: 0|[123456789][0123456789]* ◮ Typical implementation: DFA (Deterministic Finite

Automaton)

slide-14
SLIDE 14

“Problems” with Regular Languages

◮ Can’t express arbitrarily deep nesting

slide-15
SLIDE 15

Context-free Languages

Context-free language over Σ

◮ Superset of Regular Languages

slide-16
SLIDE 16

Context-free Languages

Context-free language over Σ

◮ Superset of Regular Languages ◮ Most easily explained as generated by a Context-free

Grammar (CFG)

◮ terminal symbols Σ and non-terminal symbols V ◮ start symbol S ∈ V ◮ Productions ⊂ V × (V ∪ Σ)∗

slide-17
SLIDE 17

Context-free Languages

Context-free language over Σ

◮ Superset of Regular Languages ◮ Most easily explained as generated by a Context-free

Grammar (CFG)

◮ terminal symbols Σ and non-terminal symbols V ◮ start symbol S ∈ V ◮ Productions ⊂ V × (V ∪ Σ)∗ ◮ Example for real world usage:

HTML : "<html>" BODY "</html>" BODY : "<body>" CONTENT "</html>" CONTENT : "Hello, world!" | "Hallo, Welt!"

slide-18
SLIDE 18

Context-free Languages

Context-free language over Σ

◮ Superset of Regular Languages ◮ Most easily explained as generated by a Context-free

Grammar (CFG)

◮ terminal symbols Σ and non-terminal symbols V ◮ start symbol S ∈ V ◮ Productions ⊂ V × (V ∪ Σ)∗ ◮ Example for real world usage:

HTML : "<html>" BODY "</html>" BODY : "<body>" CONTENT "</html>" CONTENT : "Hello, world!" | "Hallo, Welt!"

◮ Typical implementation: Pushdown Automaton

slide-19
SLIDE 19

“Problems” with Context-free Languages

◮ Not closed under intersection ◮ Not closed under complementation ◮ Not closed under difference

slide-20
SLIDE 20

“Problems” with Context-free Languages

◮ Not closed under intersection ◮ Not closed under complementation ◮ Not closed under difference ◮ Can’t decide inclusion ◮ Can’t decide equivalence

slide-21
SLIDE 21

“Problems” with Context-free Languages

◮ Not closed under intersection ◮ Not closed under complementation ◮ Not closed under difference ◮ Can’t decide inclusion ◮ Can’t decide equivalence ◮ Not determinizable (Deterministic Context-free languages are

a strict subset of Context-free languages)

slide-22
SLIDE 22

Nested words

◮ Nested words were constructed to overcome the limitations of

Context-free and Regular languages

◮ The class of nested word languages lies properly between

deterministic context-free languages and Regular languages

Regular languages Nested word languages Deterministic context-free languages Context-free languages

slide-23
SLIDE 23

Section 2 Nested words

slide-24
SLIDE 24

Nested words are ordinary words with extra information: The nesting structure is explicitly contained in the input. ⇒ automata for nested words need not parse the nesting.

slide-25
SLIDE 25

Definition: Nested word

◮ Later! ◮ For now: well-matched nested words

slide-26
SLIDE 26

Definition: Well-matched nested word

A well-matched nested word over an alphabet Σ is a pair (a1 . . . an, )

slide-27
SLIDE 27

Definition: Well-matched nested word

A well-matched nested word over an alphabet Σ is a pair (a1 . . . an, )

◮ a1 . . . an ∈ Σ∗ is a word over Σ

slide-28
SLIDE 28

Definition: Well-matched nested word

A well-matched nested word over an alphabet Σ is a pair (a1 . . . an, )

◮ a1 . . . an ∈ Σ∗ is a word over Σ ◮ The matching matches “start tags” with their “end tags”:

◮ ⊂ [1..n] × [1..n] ◮ Given (i, j) = (k, l) elements of , either i < j < k < l or

i < k < l < j

For (i, j) ∈ , i is a call position and j is a return position

slide-29
SLIDE 29

Well-matched N E S T E D

slide-30
SLIDE 30

Not well-matched N E S T E D

slide-31
SLIDE 31

Not well-matched N E S T E D

slide-32
SLIDE 32

Example: Simple HTML tree

HTML /HTML HEAD /HEAD BODY /BODY "Hello, world"

slide-33
SLIDE 33

Example: Simple HTML tree

HTML /HTML HEAD /HEAD BODY /BODY "Hello, world"

slide-34
SLIDE 34

Example: Process trace

main() (main) countDown(1) (countDown) print(1) countDown(0) (countDown) print(0) (print) (print)

slide-35
SLIDE 35

Example: Process trace

main() (main) countDown(1) (countDown) print(1) (print) countDown(0) (countDown) print(0) (print)

slide-36
SLIDE 36

Section 3 Nested Word Automata (NWA)

slide-37
SLIDE 37

A Nested Word Automaton takes a nested word as input and (as automatons do) accepts or rejects it.

slide-38
SLIDE 38

A Nested Word Automaton takes a nested word as input and (as automatons do) accepts or rejects it. Nested word automata have much of the power of Pushdown Automata, but can take advantage of the fact that their inputs carry a “pre-parsed” hierarchical structure.

slide-39
SLIDE 39

Definition: Deterministic Nested word automaton

Definition: A deterministic nested word automaton (DNWA) over an alphabet Σ is a structure ( Q, Q0, Qf // linear states, initial, accepting , P, P0, Pf // hierarchical states, initial, accepting , δc, δi, δr // transitions: call, internal, return ) where Q and P are sets of symbols,

slide-40
SLIDE 40

Definition: Deterministic Nested word automaton

Definition: A deterministic nested word automaton (DNWA) over an alphabet Σ is a structure ( Q, Q0, Qf // linear states, initial, accepting , P, P0, Pf // hierarchical states, initial, accepting , δc, δi, δr // transitions: call, internal, return ) where Q and P are sets of symbols, Q0 ∈ Q, P0 ∈ P, Qf ⊂ Q, Pf ⊂ P,

slide-41
SLIDE 41

Definition: Deterministic Nested word automaton

Definition: A deterministic nested word automaton (DNWA) over an alphabet Σ is a structure ( Q, Q0, Qf // linear states, initial, accepting , P, P0, Pf // hierarchical states, initial, accepting , δc, δi, δr // transitions: call, internal, return ) where Q and P are sets of symbols, Q0 ∈ Q, P0 ∈ P, Qf ⊂ Q, Pf ⊂ P, and the three δ are transition functions δc ⊂ (Σ × Q) → (Q × P) δi ⊂ (Σ × Q) → Q δr ⊂ (Σ × Q × P) → Q

slide-42
SLIDE 42

Definition: DNWA: Run

The run of a DNWA over a nested word (a1..an, ) is defined as

◮ A sequence qi for i ∈ [1, n] ◮ And a sequence pi for all call positions i

slide-43
SLIDE 43

Definition: DNWA: Run

The run of a DNWA over a nested word (a1..an, ) is defined as

◮ A sequence qi for i ∈ [1, n] ◮ And a sequence pi for all call positions i

so that for i ∈ [1, n] it holds that:

◮ if i is a call position, then δc(ai, qi−1) = (qi, pi) ◮ else if i is an internal position, then δi(ai, qi−1) = qi ◮ else if i is a return position (let h be its corresponding call

position), then δr(ai, qi−1, ph) = qi

slide-44
SLIDE 44

Definition: DNWA: Run

The run of a DNWA over a nested word (a1..an, ) is defined as

◮ A sequence qi for i ∈ [1, n] ◮ And a sequence pi for all call positions i

so that for i ∈ [1, n] it holds that:

◮ if i is a call position, then δc(ai, qi−1) = (qi, pi) ◮ else if i is an internal position, then δi(ai, qi−1) = qi ◮ else if i is a return position (let h be its corresponding call

position), then δr(ai, qi−1, ph) = qi Informally: qi is the linear trace and pi the hierarchical trace.

slide-45
SLIDE 45

Definition: DNWA: Run

The run of a DNWA over a nested word (a1..an, ) is defined as

◮ A sequence qi for i ∈ [1, n] ◮ And a sequence pi for all call positions i

so that for i ∈ [1, n] it holds that:

◮ if i is a call position, then δc(ai, qi−1) = (qi, pi) ◮ else if i is an internal position, then δi(ai, qi−1) = qi ◮ else if i is a return position (let h be its corresponding call

position), then δr(ai, qi−1, ph) = qi Informally: qi is the linear trace and pi the hierarchical trace. The run is always uniquely and well-defined (after adding transitions to a black hole state where the transition functions are undefined)

slide-46
SLIDE 46

Definition: DNWA: Acceptance

A DNWA accepts a nested word if the run over it ends in an accepting linear state: Let A be a DNWA with accepting linear states Qf , and let (q1..n, p1..m) be the run of A over a nested word w. Then A accepts w iff qn ∈ Qf .

slide-47
SLIDE 47

Example: Nested word automaton

Task: Given Σ = {[, (, ], )}, build an acceptor for the language of properly balanced parentheses.

slide-48
SLIDE 48

Example: Nested word automaton

Task: Given Σ = {[, (, ], )}, build an acceptor for the language of properly balanced parentheses. Q = {q} Q0 = q Qf = {q}

slide-49
SLIDE 49

Example: Nested word automaton

Task: Given Σ = {[, (, ], )}, build an acceptor for the language of properly balanced parentheses. Q = {q} Q0 = q Qf = {q} P = Σ ˙ ∪ {⊥} P0 = ⊥ Pf = {⊥}

slide-50
SLIDE 50

Example: Nested word automaton

Task: Given Σ = {[, (, ], )}, build an acceptor for the language of properly balanced parentheses. Q = {q} Q0 = q Qf = {q} P = Σ ˙ ∪ {⊥} P0 = ⊥ Pf = {⊥} δc = { ((, q) → (q, (), ([, q) → (q, [) } δr = { (), q, () → q, (], q, [) → q } δi = ∅

slide-51
SLIDE 51

Remarks

The last example is actually a showcase example for Pushdown automata.

slide-52
SLIDE 52

Remarks

The last example is actually a showcase example for Pushdown automata. Important differences of NWAs:

◮ The nesting structure is in the input nested word, not parsed

at run-time.

slide-53
SLIDE 53

Remarks

The last example is actually a showcase example for Pushdown automata. Important differences of NWAs:

◮ The nesting structure is in the input nested word, not parsed

at run-time.

◮ The NWA has only an implicit stack. The stack manipulation

type (push, pop, nothing) at each input symbol is only a function of the nesting structure.

slide-54
SLIDE 54

Remarks

The last example is actually a showcase example for Pushdown automata. Important differences of NWAs:

◮ The nesting structure is in the input nested word, not parsed

at run-time.

◮ The NWA has only an implicit stack. The stack manipulation

type (push, pop, nothing) at each input symbol is only a function of the nesting structure.

◮ Push: exactly one element per call position.

slide-55
SLIDE 55

Remarks

The last example is actually a showcase example for Pushdown automata. Important differences of NWAs:

◮ The nesting structure is in the input nested word, not parsed

at run-time.

◮ The NWA has only an implicit stack. The stack manipulation

type (push, pop, nothing) at each input symbol is only a function of the nesting structure.

◮ Push: exactly one element per call position. ◮ Pop: exactly one element per return position.

slide-56
SLIDE 56

Remarks

The last example is actually a showcase example for Pushdown automata. Important differences of NWAs:

◮ The nesting structure is in the input nested word, not parsed

at run-time.

◮ The NWA has only an implicit stack. The stack manipulation

type (push, pop, nothing) at each input symbol is only a function of the nesting structure.

◮ Push: exactly one element per call position. ◮ Pop: exactly one element per return position. ◮ Reading the stack only when popping (returning).

slide-57
SLIDE 57

Remarks

The last example is actually a showcase example for Pushdown automata. Important differences of NWAs:

◮ The nesting structure is in the input nested word, not parsed

at run-time.

◮ The NWA has only an implicit stack. The stack manipulation

type (push, pop, nothing) at each input symbol is only a function of the nesting structure.

◮ Push: exactly one element per call position. ◮ Pop: exactly one element per return position. ◮ Reading the stack only when popping (returning).

With these restrictions, processing a nested word takes place in fixed linear time and space.

slide-58
SLIDE 58

This is the last slide.