Applications of Tree Automata Theory Lecture I: Tree Automata - - PowerPoint PPT Presentation

applications of tree automata theory lecture i tree
SMART_READER_LITE
LIVE PREVIEW

Applications of Tree Automata Theory Lecture I: Tree Automata - - PowerPoint PPT Presentation

Applications of Tree Automata Theory Lecture I: Tree Automata Andreas Maletti Institute of Computer Science Universitt Leipzig, Germany on leave from: Institute for Natural Language Processing Universitt Stuttgart, Germany


slide-1
SLIDE 1

Applications of Tree Automata Theory Lecture I: Tree Automata

Andreas Maletti

Institute of Computer Science Universität Leipzig, Germany

  • n leave from: Institute for Natural Language Processing

Universität Stuttgart, Germany maletti@ims.uni-stuttgart.de

Yekaterinburg — August 23, 2014

Lecture I: Tree Automata

  • A. Maletti

· 1

slide-2
SLIDE 2

Roadmap

1 Theory of Tree Automata 2 Parsing — Basics and Evaluation 3 Parsing — Advanced Topics 4 Machine Translation — Basics and Evaluation 5 Theory of Tree Transducers 6 Machine Translation — Advanced Topics

Always ask questions right away!

Lecture I: Tree Automata

  • A. Maletti

· 2

slide-3
SLIDE 3

Trees

Motivation and Notation

Lecture I: Tree Automata

  • A. Maletti

· 3

slide-4
SLIDE 4

Trees — Parses

We must bear in mind the Community as a whole

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

Lecture I: Tree Automata

  • A. Maletti

· 4

slide-5
SLIDE 5

Trees — Parses

➮ âû äåéñòâèòåëüíî ñìîæåòå èçìåíèòü ñâîþ æèçíü✱ âîñïîëüçîâàâøèñü ïðèíöèïîì ✽✵ ê ✷✵

èçìåíèòü ➮ âû ñìîæåòå äåéñòâèòåëüíî æèçíü ñâîþ âîñïîëüçîâàâøèñü ✱ ïðèíöèïîì ✽✵ ê ✷✵

conj subj auxs adv acc adj adv misc ins gen card gen Lecture I: Tree Automata

  • A. Maletti

· 4

slide-6
SLIDE 6

Trees — XML

<library> <book> <title>The Golden Ticket</title> <author>Lance Fortnow</author> <publisher>Princeton Univ. Press</publisher> </book><book> <title>Anna Karenina</title> <author>Leo Tolstoy</author> <publisher>The Russian Messenger</publisher> </book> </library>

library book title

The Golden Ticket

author

Lance Fortnow

publisher

Princeton Univ. Press

book title

Anna Karenina

author

Leo Tolstoy

publisher

The Russian Messenger

Lecture I: Tree Automata

  • A. Maletti

· 5

slide-7
SLIDE 7

Trees

Sets Σ and Q Definition Set TΣ(Q) of Σ-trees indexed by Q is smallest T q ∈ T for all q ∈ Q σ(t1, . . . , tk) ∈ T for all k ∈ N, σ ∈ Σ, and t1, . . . , tk ∈ T We assume Σ ∩ Q = ∅ and write σ instead of σ()

Lecture I: Tree Automata

  • A. Maletti

· 6

slide-8
SLIDE 8

Trees

Example Σ = {σ, α} and Q = {q, p} σ(σ(p, q), σ(σ(q, σ))) ∈ TΣ(Q) σ σ p q σ σ q σ

Lecture I: Tree Automata

  • A. Maletti

· 7

slide-9
SLIDE 9

Trees

Example Σ = {σ, α} and Q = {q, p} σ(σ(p, q), σ(σ(q, σ))) ∈ TΣ(Q) σ σ p q σ σ q σ Notes

  • bvious recursion & induction principle

Lecture I: Tree Automata

  • A. Maletti

· 7

slide-10
SLIDE 10

Trees — Recursion

Definition (Gorn address) Mapping pos: TΣ(Q) → 2N∗ assigning positions pos(q) = {ε} pos(σ(t1, . . . , tk)) = {ε} ∪ {i.w | 1 ≤ i ≤ k, w ∈ pos(ti)}

Lecture I: Tree Automata

  • A. Maletti

· 8

slide-11
SLIDE 11

Trees — Recursion

Definition (Gorn address) Mapping pos: TΣ(Q) → 2N∗ assigning positions pos(q) = {ε} pos(σ(t1, . . . , tk)) = {ε} ∪ {i.w | 1 ≤ i ≤ k, w ∈ pos(ti)} Definition Leaves of t ∈ TΣ(Q): leaves(t) = {w ∈ pos(t) | w.1 / ∈ pos(t)}

Lecture I: Tree Automata

  • A. Maletti

· 8

slide-12
SLIDE 12

Trees — Recursion

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

Leaves marked in red

Lecture I: Tree Automata

  • A. Maletti

· 9

slide-13
SLIDE 13

Trees — Recursion

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

1 1 1 2 1 1 2 1 1 2 1 1 2 1 1 3 1 1 1 2 1 2 1 1 2 1 1 2 1 Lecture I: Tree Automata

  • A. Maletti

· 9

slide-14
SLIDE 14

Trees — Recursion

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

1 1 1 2 1 1 2 1 1 2 1 1 2 1 1 3 1 1 1 2 1 2 1 1 2 1 1 2 1

Address of marked ‘DT’: 2.2.3.1.1

Lecture I: Tree Automata

  • A. Maletti

· 9

slide-15
SLIDE 15

Trees

Trees t, u ∈ TΣ(Q) and position w ∈ pos(t) in t Notation t(w) = label of t at position w t|w = subtree rooted in w t[u]w = tree obtained by replacing the subtree at w in t by u

Lecture I: Tree Automata

  • A. Maletti

· 10

slide-16
SLIDE 16

Trees

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

1 1 1 2 1 1 2 1 1 2 1 1 2 1 1 3 1 1 1 2 1 2 1 1 2 1 1 2 1

t(2.2.1.1) = bear

Lecture I: Tree Automata

  • A. Maletti

· 11

slide-17
SLIDE 17

Trees

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

1 1 1 2 1 1 2 1 1 2 1 1 2 1 1 3 1 1 1 2 1 2 1 1 2 1 1 2 1

t(2.2.1.1) = bear t|2.2.3.1 = NP(DT(the), NN(Community))

Lecture I: Tree Automata

  • A. Maletti

· 11

slide-18
SLIDE 18

Trees

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP

  • DT

the NN Community PP IN as NP DT a NN whole

1 1 1 2 1 1 2 1 1 2 1 1 2 1 1 3 1 1 1 2 1 2 1 1 2 1 1 2 1

t(2.2.1.1) = bear t|2.2.3.1 = NP(DT(the), NN(Community)) t = t′[NP(DT(the), NN(Community))]2.2.3.1

Lecture I: Tree Automata

  • A. Maletti

· 11

slide-19
SLIDE 19

Trees

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

1 1 1 2 1 1 2 1 1 2 1 1 2 1 1 3 1 1 1 2 1 2 1 1 2 1 1 2 1

t(2.2.1.1) = bear t|2.2.3.1 = NP(DT(the), NN(Community)) t = t′[NP(DT(the), NN(Community))]2.2.3.1

Lecture I: Tree Automata

  • A. Maletti

· 11

slide-20
SLIDE 20

Tree Language

Representations

Lecture I: Tree Automata

  • A. Maletti

· 12

slide-21
SLIDE 21

Tree Language

Tree language (or forest) = subset of TΣ(Q) Motivation parses of a sentence parse forest translations of an input sentence translation forest valid XML documents XML schema . . .

Lecture I: Tree Automata

  • A. Maletti

· 13

slide-22
SLIDE 22

Tree Language

How to represent a set of trees? enumerate them

Lecture I: Tree Automata

  • A. Maletti

· 14

slide-23
SLIDE 23

Tree Language

How to represent a set of trees? enumerate them enumerate them cleverly (e.g., add sharing)

Lecture I: Tree Automata

  • A. Maletti

· 14

slide-24
SLIDE 24

Tree Language

L = {σ(σ(p, q), σ(σ(q, σ))) , α(σ(q, σ), α)} Enumeration of L: σ σ p q σ σ q σ α σ q σ α Subtree-shared enumeration of L: (packed forest) σ σ p q σ σ

  • σ

α

  • α

Lecture I: Tree Automata

  • A. Maletti

· 15

slide-25
SLIDE 25

Tree Language

How to represent a set of trees? enumerate them enumerate them cleverly (packed forest)

Lecture I: Tree Automata

  • A. Maletti

· 16

slide-26
SLIDE 26

Tree Language

How to represent a set of trees? enumerate them enumerate them cleverly (packed forest) parse forest of a CFG

Lecture I: Tree Automata

  • A. Maletti

· 16

slide-27
SLIDE 27

Parse Forest of a CFG

Example S → NP VP VP → MD VP NP → NP PP VP → VB PP NP MD → must . . .

Lecture I: Tree Automata

  • A. Maletti

· 17

slide-28
SLIDE 28

Parse Forest of a CFG

Example S → NP VP VP → MD VP NP → NP PP VP → VB PP NP MD → must . . .

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

Lecture I: Tree Automata

  • A. Maletti

· 17

slide-29
SLIDE 29

Parse Forest of a CFG

Example S → NP VP VP → MD VP NP → NP PP VP → VB PP NP MD → must . . .

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

Lecture I: Tree Automata

  • A. Maletti

· 17

slide-30
SLIDE 30

Parse Forest of a CFG

Example S → NP VP VP → MD VP NP → NP PP VP → VB PP NP MD → must . . .

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

Lecture I: Tree Automata

  • A. Maletti

· 17

slide-31
SLIDE 31

Parse Forest of a CFG

Example S → NP VP VP → MD VP NP → NP PP VP → VB PP NP MD → must . . .

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

Lecture I: Tree Automata

  • A. Maletti

· 17

slide-32
SLIDE 32

Parse Forest of a CFG

Example S → NP VP VP → MD VP NP → NP PP VP → VB PP NP MD → must . . .

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

Lecture I: Tree Automata

  • A. Maletti

· 17

slide-33
SLIDE 33

Parse Forest of a CFG

Example S → NP VP VP → MD VP NP → NP PP VP → VB PP NP MD → must . . .

S NP PRP We VP MD must VP VB bear PP IN in NP NN mind NP NP DT the NN Community PP IN as NP DT a NN whole

Lecture I: Tree Automata

  • A. Maletti

· 17

slide-34
SLIDE 34

Local Tree Grammar

Definition A local tree grammar (LTG) is a CFG G = (N, Q, S, P) finite set N nonterminals finite set Q terminals S ⊆ N start nonterminals finite set P ⊆ N × (N ∪ Q)∗ productions It will compute the derivation trees of the CFG

Lecture I: Tree Automata

  • A. Maletti

· 18

slide-35
SLIDE 35

Local Tree Grammar

LTG G = (N, Q, S, P) Definition (Generated tree language) L(G) contains exactly the trees t ∈ TN(Q) t(ε) ∈ S root label in S t(w) → t(w.1) · · · t(w.k) ∈ P for every internal node w ∈ pos(t) with {i ∈ N | w.i ∈ pos(t)} = {1, . . . , k} = ∅ “label → child labels” is a production of G t(w) ∈ Q for all w ∈ leaves(t) leaves labeled by Q

Lecture I: Tree Automata

  • A. Maletti

· 19

slide-36
SLIDE 36

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-37
SLIDE 37

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs Theorem Local tree languages (LTL) are generated by LTGs closed under CFL (strings) LTL (trees) union intersection (rank-bounded) complement relabeling

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-38
SLIDE 38

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs Theorem Local tree languages (LTL) are generated by LTGs closed under CFL (strings) LTL (trees) union ✓ intersection (rank-bounded) complement relabeling

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-39
SLIDE 39

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs Theorem Local tree languages (LTL) are generated by LTGs closed under CFL (strings) LTL (trees) union ✓ intersection ✗ (rank-bounded) complement relabeling

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-40
SLIDE 40

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs Theorem Local tree languages (LTL) are generated by LTGs closed under CFL (strings) LTL (trees) union ✓ intersection ✗ (rank-bounded) complement ✗ relabeling

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-41
SLIDE 41

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs Theorem Local tree languages (LTL) are generated by LTGs closed under CFL (strings) LTL (trees) union ✓ intersection ✗ (rank-bounded) complement ✗ relabeling ✓

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-42
SLIDE 42

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs Theorem Local tree languages (LTL) are generated by LTGs closed under CFL (strings) LTL (trees) union ✓ ✗ intersection ✗ (rank-bounded) complement ✗ relabeling ✓

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-43
SLIDE 43

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs Theorem Local tree languages (LTL) are generated by LTGs closed under CFL (strings) LTL (trees) union ✓ ✗ intersection ✗ ✓ (rank-bounded) complement ✗ relabeling ✓

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-44
SLIDE 44

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs Theorem Local tree languages (LTL) are generated by LTGs closed under CFL (strings) LTL (trees) union ✓ ✗ intersection ✗ ✓ (rank-bounded) complement ✗ ✗ relabeling ✓

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-45
SLIDE 45

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs Theorem Local tree languages (LTL) are generated by LTGs closed under CFL (strings) LTL (trees) union ✓ ✗ intersection ✗ ✓ (rank-bounded) complement ✗ ✗ relabeling ✓ ✗

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-46
SLIDE 46

Local Tree Grammar

Observation LTGs generate exactly the parse forests of CFGs Properties ✓ simple ✓ no ambiguity (unique explanation for each generated tree) ✗ not closed under (most) BOOLEAN operations (union/intersection/complement: ✗/✓/✗) ✗ not closed under (non-injective) relabelings ✗ . . .

Lecture I: Tree Automata

  • A. Maletti

· 20

slide-47
SLIDE 47

Local Tree Grammar

LTG G = (N, Q, S, P) No ambiguity

S NP PRP$ My NN dog VP VBZ sleeps

is in L(G) if and only if

1 S is a start nonterminal 2 all the productions in it are in P 3 all leaves are labeled by elements of Q

Lecture I: Tree Automata

  • A. Maletti

· 21

slide-48
SLIDE 48

Local Tree Grammar

Observation Local tree languages are not closed under union Proof.

The following single-element tree languages are local:

S NP PRP$ My NN dog VP VBZ sleeps S NP PRP I VP VBD scored ADVP RB well

But their union is not local as it must also generate trees for My dog scored well and *I sleeps

Lecture I: Tree Automata

  • A. Maletti

· 22

slide-49
SLIDE 49

Tree Language

How to represent a set of trees? enumerate them enumerate them cleverly (packed forest) parse forest of a CFG (local tree languages)

Lecture I: Tree Automata

  • A. Maletti

· 23

slide-50
SLIDE 50

Tree Language

How to represent a set of trees? enumerate them enumerate them cleverly (packed forest) parse forest of a CFG (local tree languages)

Lecture I: Tree Automata

  • A. Maletti

· 23

slide-51
SLIDE 51

Tree Language

How to represent a set of trees? enumerate them enumerate them cleverly (packed forest) parse forest of a CFG (local tree languages) tree substitution grammar

Lecture I: Tree Automata

  • A. Maletti

· 23

slide-52
SLIDE 52

Local Tree Grammar

CFG production L → R1 R2 R3 represented by tree L R1 R2 R3 “Glue” fragments together to obtain larger trees: S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 24

slide-53
SLIDE 53

Local Tree Grammar

CFG production L → R1 R2 R3 represented by tree L R1 R2 R3 “Glue” fragments together to obtain larger trees: S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 24

slide-54
SLIDE 54

Local Tree Grammar

CFG production L → R1 R2 R3 represented by tree L R1 R2 R3 “Glue” fragments together to obtain larger trees: S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 24

slide-55
SLIDE 55

Local Tree Grammar

CFG production L → R1 R2 R3 represented by tree L R1 R2 R3 “Glue” fragments together to obtain larger trees: S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 24

slide-56
SLIDE 56

Local Tree Grammar

CFG production L → R1 R2 R3 represented by tree L R1 R2 R3 “Glue” fragments together to obtain larger trees: S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 24

slide-57
SLIDE 57

Local Tree Grammar

CFG production L → R1 R2 R3 represented by tree L R1 R2 R3 “Glue” fragments together to obtain larger trees: S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 24

slide-58
SLIDE 58

Local Tree Grammar

CFG production L → R1 R2 R3 represented by tree L R1 R2 R3 “Glue” fragments together to obtain larger trees: S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 24

slide-59
SLIDE 59

Local Tree Grammar

CFG production L → R1 R2 R3 represented by tree L R1 R2 R3 “Glue” fragments together to obtain larger trees: S NP PRP We VP MD must VP VB PP NP But why only small tree fragments?

Lecture I: Tree Automata

  • A. Maletti

· 24

slide-60
SLIDE 60

Tree Substitution Grammar

Definition (JOSHI 1969) A tree substitution grammar (TSG) is a tuple (N, Q, S, P) finite set N nonterminals finite set Q terminals S ⊆ N start nonterminals finite set P ⊆ TN(Q) \ Q tree fragments

Lecture I: Tree Automata

  • A. Maletti

· 25

slide-61
SLIDE 61

Tree Substitution Grammar

Definition (JOSHI 1969) A tree substitution grammar (TSG) is a tuple (N, Q, S, P) finite set N nonterminals finite set Q terminals S ⊆ N start nonterminals finite set P ⊆ TN(Q) \ Q tree fragments Typical fragments [POST 2011]

VP VBD NP CD PP S NP PRP VP S NP VP TO VP

Lecture I: Tree Automata

  • A. Maletti

· 25

slide-62
SLIDE 62

Tree Substitution Grammar

TSG G = (N, Q, S, P) and sentential forms ξ, ζ ∈ TN(Q) Definition ξ ⇒G ζ if there exist a tree fragment t ∈ P and a position w ∈ leaves(ξ) such that ξ = ξ[t(ε)]w and ζ = ξ[t]w

Lecture I: Tree Automata

  • A. Maletti

· 26

slide-63
SLIDE 63

Tree Substitution Grammar

TSG G = (N, Q, S, P) and sentential forms ξ, ζ ∈ TN(Q) Definition ξ ⇒G ζ if there exist a tree fragment t ∈ P and a position w ∈ leaves(ξ) such that ξ = ξ[t(ε)]w and ζ = ξ[t]w Intuition In a derivation step:

1 Find a leaf labeled by n ∈ N 2 Find a fragment t ∈ P such that t(ε) = n 3 Replace selected n by t

Lecture I: Tree Automata

  • A. Maletti

· 26

slide-64
SLIDE 64

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-65
SLIDE 65

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-66
SLIDE 66

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-67
SLIDE 67

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-68
SLIDE 68

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-69
SLIDE 69

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-70
SLIDE 70

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-71
SLIDE 71

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-72
SLIDE 72

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-73
SLIDE 73

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-74
SLIDE 74

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-75
SLIDE 75

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-76
SLIDE 76

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-77
SLIDE 77

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-78
SLIDE 78

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-79
SLIDE 79

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-80
SLIDE 80

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-81
SLIDE 81

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-82
SLIDE 82

Tree Substitution Grammar

Example Fragments: S(NP, VP) NP(PRP) VP(MD, VP) PRP(We) VP(VB, PP, NP) MD(must) S NP PRP We VP MD must VP VB PP NP

Lecture I: Tree Automata

  • A. Maletti

· 27

slide-83
SLIDE 83

Tree Substitution Grammar

TSG G = (N, Q, S, P) Definition for all n ∈ N: L(G, n) = {t ∈ TN(Q) | ∀w ∈ leaves(t): t(w) ∈ Q, n ⇒∗

G t}

L(G) =

s∈S L(G, s)

Lecture I: Tree Automata

  • A. Maletti

· 28

slide-84
SLIDE 84

Tree Substitution Grammar

TSG G = (N, Q, S, P) and TSL = {L(G) | G TSG} Definition for all n ∈ N: L(G, n) = {t ∈ TN(Q) | ∀w ∈ leaves(t): t(w) ∈ Q, n ⇒∗

G t}

L(G) =

s∈S L(G, s)

Theorem

1 FIN TSL

all finite languages are TSL

Lecture I: Tree Automata

  • A. Maletti

· 28

slide-85
SLIDE 85

Tree Substitution Grammar

TSG G = (N, Q, S, P) and TSL = {L(G) | G TSG} Definition for all n ∈ N: L(G, n) = {t ∈ TN(Q) | ∀w ∈ leaves(t): t(w) ∈ Q, n ⇒∗

G t}

L(G) =

s∈S L(G, s)

Theorem

1 FIN TSL

all finite languages are TSL

2 LTL TSL

all local tree languages are TSL

Lecture I: Tree Automata

  • A. Maletti

· 28

slide-86
SLIDE 86

Tree Substitution Grammar

Theorem Tree substitution languages (TSL) have the following properties: closed under CFL LTL TSL union ✓ ✗ intersection ✗ ✓ (rank-bounded) complement ✗ ✗ relabeling ✓ ✗

Lecture I: Tree Automata

  • A. Maletti

· 29

slide-87
SLIDE 87

Tree Substitution Grammar

Theorem Tree substitution languages (TSL) have the following properties: closed under CFL LTL TSL union ✓ ✗ ✗ intersection ✗ ✓ (rank-bounded) complement ✗ ✗ relabeling ✓ ✗

Lecture I: Tree Automata

  • A. Maletti

· 29

slide-88
SLIDE 88

Tree Substitution Grammar

Theorem Tree substitution languages (TSL) have the following properties: closed under CFL LTL TSL union ✓ ✗ ✗ intersection ✗ ✓ ✗ (rank-bounded) complement ✗ ✗ relabeling ✓ ✗

Lecture I: Tree Automata

  • A. Maletti

· 29

slide-89
SLIDE 89

Tree Substitution Grammar

Theorem Tree substitution languages (TSL) have the following properties: closed under CFL LTL TSL union ✓ ✗ ✗ intersection ✗ ✓ ✗ (rank-bounded) complement ✗ ✗ ✗ relabeling ✓ ✗

Lecture I: Tree Automata

  • A. Maletti

· 29

slide-90
SLIDE 90

Tree Substitution Grammar

Theorem Tree substitution languages (TSL) have the following properties: closed under CFL LTL TSL union ✓ ✗ ✗ intersection ✗ ✓ ✗ (rank-bounded) complement ✗ ✗ ✗ relabeling ✓ ✗ ✗

Lecture I: Tree Automata

  • A. Maletti

· 29

slide-91
SLIDE 91

Tree Substitution Grammar

Properties ✓ simple ✓ more expressive than local tree grammars ✗ ambiguity (several explanations for a generated tree) ✗ not closed under BOOLEAN operations (union/intersection/complement: ✗/✗/✗) ✗ not closed under (non-injective) relabelings ✗ . . .

Lecture I: Tree Automata

  • A. Maletti

· 30

slide-92
SLIDE 92

Tree Substitution Grammar

Theorem Tree substitution languages are not closed under union Proof. Counterexample must be infinite artificial example

S C C a a S C C b b

L1 = {S(Cn(a), a) | n ∈ N} L2 = {S(Cn(b), b) | n ∈ N} Their union is not a tree substitution language

Lecture I: Tree Automata

  • A. Maletti

· 31

slide-93
SLIDE 93

Tree Substitution Grammar

Theorem Tree substitution languages are not closed under intersection Proof. Ideas?

Lecture I: Tree Automata

  • A. Maletti

· 32

slide-94
SLIDE 94

Tree Language

How to represent a set of trees? enumerate them enumerate them cleverly (packed forest) parse forest of a CFG (local tree languages) tree substitution grammar regular tree grammar

Lecture I: Tree Automata

  • A. Maletti

· 33

slide-95
SLIDE 95

Regular Tree Grammar

Definition (BRAINERD, 1969) A regular tree grammar (RTG) is a tuple G = (N, Σ, S, P) with finite set N nonterminals finite set Σ terminals S ⊆ N start nonterminals finite set P ⊆ N × TΣ(N) productions Remark Instead of (n, t) we write n → t

Lecture I: Tree Automata

  • A. Maletti

· 34

slide-96
SLIDE 96

Regular Tree Grammar

Example N = {n0, n1, n2, n3, n4, n5, n6} Σ = {VP, NP, S, . . . } S = {n0} and the following productions:

n4 → VP n5 NP n2 n3 n0 → S NP n1 n4 n0 → S n6 VP n2 n4

Lecture I: Tree Automata

  • A. Maletti

· 35

slide-97
SLIDE 97

Regular Tree Grammar

RTG G = (N, Σ, S, P) and sentential forms ξ, ζ ∈ TΣ(N) Definition (Derivation Semantics) ξ ⇒G ζ if there exist a production n → t ∈ P and a position w ∈ leaves(ξ) such that ξ = ξ[n]w and ζ = ξ[t]w

Lecture I: Tree Automata

  • A. Maletti

· 36

slide-98
SLIDE 98

Regular Tree Grammar

RTG G = (N, Σ, S, P) and sentential forms ξ, ζ ∈ TΣ(N) Definition (Derivation Semantics) ξ ⇒G ζ if there exist a production n → t ∈ P and a position w ∈ leaves(ξ) such that ξ = ξ[n]w and ζ = ξ[t]w Definition (Recognized tree language) L(G) = {t ∈ TΣ | ∃s ∈ S : s ⇒∗

G t}

Lecture I: Tree Automata

  • A. Maletti

· 36

slide-99
SLIDE 99

Regular Tree Grammar

Productions n4 → VP n5 NP n2 n3 n0 → S NP n1 n4 n0 → S n6 VP n2 n4 Derivation:

n0 ⇒G S NP n1 n4 ⇒G S NP n1 VP n5 NP n2 n3

Lecture I: Tree Automata

  • A. Maletti

· 37

slide-100
SLIDE 100

Regular Tree Grammar

Productions n4 → VP n5 NP n2 n3 n0 → S NP n1 n4 n0 → S n6 VP n2 n4 Derivation:

n0 ⇒G S NP n1 n4 ⇒G S NP n1 VP n5 NP n2 n3

Lecture I: Tree Automata

  • A. Maletti

· 37

slide-101
SLIDE 101

Regular Tree Grammar

Productions n4 → VP n5 NP n2 n3 n0 → S NP n1 n4 n0 → S n6 VP n2 n4 Derivation:

n0 ⇒G S NP n1 n4 ⇒G S NP n1 VP n5 NP n2 n3

Lecture I: Tree Automata

  • A. Maletti

· 37

slide-102
SLIDE 102

Regular Tree Grammar

regular tree languages RTL = {L(G) | G RTG} Theorem tree substitution languages regular tree languages Proof. We can express the union counterexample easily

Lecture I: Tree Automata

  • A. Maletti

· 38

slide-103
SLIDE 103

Regular Tree Grammar

Theorem Regular tree languages (RTL) have the following properties: closed under CFL LTL TSL RTL union ✓ ✗ ✗ intersection ✗ ✓ ✗ (rank-bounded) complement ✗ ✗ ✗ relabeling ✓ ✗ ✗

Lecture I: Tree Automata

  • A. Maletti

· 39

slide-104
SLIDE 104

Regular Tree Grammar

Theorem Regular tree languages (RTL) have the following properties: closed under CFL LTL TSL RTL union ✓ ✗ ✗ ✓ intersection ✗ ✓ ✗ (rank-bounded) complement ✗ ✗ ✗ relabeling ✓ ✗ ✗

Lecture I: Tree Automata

  • A. Maletti

· 39

slide-105
SLIDE 105

Regular Tree Grammar

Theorem Regular tree languages (RTL) have the following properties: closed under CFL LTL TSL RTL union ✓ ✗ ✗ ✓ intersection ✗ ✓ ✗ ✓ (rank-bounded) complement ✗ ✗ ✗ relabeling ✓ ✗ ✗

Lecture I: Tree Automata

  • A. Maletti

· 39

slide-106
SLIDE 106

Regular Tree Grammar

Theorem Regular tree languages (RTL) have the following properties: closed under CFL LTL TSL RTL union ✓ ✗ ✗ ✓ intersection ✗ ✓ ✗ ✓ (rank-bounded) complement ✗ ✗ ✗ ✓ relabeling ✓ ✗ ✗

Lecture I: Tree Automata

  • A. Maletti

· 39

slide-107
SLIDE 107

Regular Tree Grammar

Theorem Regular tree languages (RTL) have the following properties: closed under CFL LTL TSL RTL union ✓ ✗ ✗ ✓ intersection ✗ ✓ ✗ ✓ (rank-bounded) complement ✗ ✗ ✗ ✓ relabeling ✓ ✗ ✗ ✓

Lecture I: Tree Automata

  • A. Maletti

· 39

slide-108
SLIDE 108

Regular Tree Grammar

Properties ✓ simple ✓ more expressive than tree substitution grammars ✗ ambiguity (several explanations for a generated tree) ✓ closed under all BOOLEAN operations (union/intersection/complement: ✓/✓/✓) ✓ closed under (non-injective) relabelings ✓ . . .

Lecture I: Tree Automata

  • A. Maletti

· 40

slide-109
SLIDE 109

Regular Tree Grammar

RTG G = (N, Σ, S, P) Definition (BRAINERD, 1969) G is in normal form if t = σ(n1, . . . , nk) with σ ∈ Σ and n1, . . . , nk ∈ N for all n → t ∈ P

Lecture I: Tree Automata

  • A. Maletti

· 41

slide-110
SLIDE 110

Regular Tree Grammar

RTG G = (N, Σ, S, P) Definition (BRAINERD, 1969) G is in normal form if t = σ(n1, . . . , nk) with σ ∈ Σ and n1, . . . , nk ∈ N for all n → t ∈ P Example productions n4 → VP n5 NP n2 n3 n0 → S NP n1 n4 n0 → S n6 VP n2 n4

Lecture I: Tree Automata

  • A. Maletti

· 41

slide-111
SLIDE 111

Regular Tree Grammar

Theorem (BRAINERD, 1969) Every RTG is equivalent to an RTG in normal form Proof. Simply cut large rules introducing new states n0 → S n6 VP n2 n4 → n0 → S n6 n n → VP n2 n4

Lecture I: Tree Automata

  • A. Maletti

· 42

slide-112
SLIDE 112

Standard Representation

Tree Automata

Lecture I: Tree Automata

  • A. Maletti

· 43

slide-113
SLIDE 113

Tree Automaton

Definition (THATCHER, 1970; ROUNDS, 1970) A tree automaton (TA) is an RTG in normal form

Lecture I: Tree Automata

  • A. Maletti

· 44

slide-114
SLIDE 114

Tree Automaton

Definition (THATCHER, 1970; ROUNDS, 1970) A tree automaton (TA) is an RTG in normal form Remarks bottom-up: rules written as σ(n1, . . . , nk) → n top-down: rules written as n → σ(n1, . . . , nk)

Lecture I: Tree Automata

  • A. Maletti

· 44

slide-115
SLIDE 115

Tree Automaton

Definition top-down deterministic if ∀n ∈ N, k ∈ N, σ ∈ Σ ∃ at most one n1, . . . , nk ∈ N : n → σ(n1, . . . , nk) ∈ P and |S| = 1 bottom-up deterministic if ∀k ∈ N, σ ∈ Σ, n1, . . . , nk ∈ N ∃ at most one n ∈ N : σ(n1, . . . , nk) → n ∈ P

(red determines blue)

Lecture I: Tree Automata

  • A. Maletti

· 45

slide-116
SLIDE 116

Tree Automaton

Theorem (THATCHER, WRIGHT, 1968; DONER, 1970) top-down det. RTL bottom-up det. RTL = RTL

Lecture I: Tree Automata

  • A. Maletti

· 46

slide-117
SLIDE 117

Tree Automaton

Theorem (THATCHER, WRIGHT, 1968; DONER, 1970) top-down det. RTL bottom-up det. RTL = RTL Proof. Let G = (N, Σ, S, P) be a tree automaton. We construct bottom-up det. TA G′ = (2N, Σ, S′, P′) with S′ = {N′ ⊆ N | N′ ∩ S = ∅} contains start nonterminal for each σ ∈ Σ, N1, . . . , Nk ⊆ N, and k ∈ rkP(σ) {n | n → σ(n1, . . . , nk) ∈ P, ni ∈ Ni} → σ(N1, . . . , Nk) ∈ P′ no other productions are in P′

Lecture I: Tree Automata

  • A. Maletti

· 46

slide-118
SLIDE 118

Tree Automaton

Theorem (THATCHER, WRIGHT, 1968; DONER, 1970) top-down det. RTL bottom-up det. RTL = RTL Proof. Let G = (N, Σ, S, P) be a tree automaton. We construct bottom-up det. TA G′ = (2N, Σ, S′, P′) with S′ = {N′ ⊆ N | N′ ∩ S = ∅} contains start nonterminal for each σ ∈ Σ, N1, . . . , Nk ⊆ N, and k ∈ rkP(σ) {n | n → σ(n1, . . . , nk) ∈ P, ni ∈ Ni} → σ(N1, . . . , Nk) ∈ P′ no other productions are in P′ Strictness by the tree language L = {σ(α, β), σ(β, α)}

Lecture I: Tree Automata

  • A. Maletti

· 46

slide-119
SLIDE 119

Tree Automaton

RTL top-down det. RTL TSL bottom-up det. RTL LTL FIN

Lecture I: Tree Automata

  • A. Maletti

· 47

slide-120
SLIDE 120

Tree Automaton

RTL top-down det. RTL TSL bottom-up det. RTL LTL FIN

Lecture I: Tree Automata

  • A. Maletti

· 47

slide-121
SLIDE 121

Tree Automaton

RTL top-down det. RTL TSL bottom-up det. RTL LTL FIN Remark finite tree languages ⊆ top-down deterministic RTL

Lecture I: Tree Automata

  • A. Maletti

· 47

slide-122
SLIDE 122

Tree Automaton

Theorem Regular tree languages are closed under all BOOLEAN operations substitution (quotients) and iteration (non-deterministic) relabelings linear homomorphisms inverse homomorphisms

Lecture I: Tree Automata

  • A. Maletti

· 48

slide-123
SLIDE 123

Tree Automaton

Theorem Regular tree languages are closed under substitution Definition L, L′ ⊆ TΣ tree languages and α ∈ Σ L[α ← L′] contains all trees obtained from a tree of L by replacing each leaf labeled α by a tree of L′ different occurrences can be differently replaced

Lecture I: Tree Automata

  • A. Maletti

· 49

slide-124
SLIDE 124

Tree Automaton

Theorem Regular tree languages are closed under substitution L[α ← L′] t t1 t2 t3 t ∈ L t1, t2, t3 ∈ L′

Lecture I: Tree Automata

  • A. Maletti

· 49

slide-125
SLIDE 125

Tree Automaton

DTA = bottom-up deterministic tree automaton Definition A TA is minimal in C if all equivalent TAs of C have at least as many states Theorem Complexity of minimization problems:

  • utp. C \ inp. model

DTA TA DTA PTime ExpTime TA PSpace-hard ExpTime in ExpTime

Lecture I: Tree Automata

  • A. Maletti

· 50

slide-126
SLIDE 126

Tree Automaton

Definition Height of a tree t ∈ TΣ(Q): height(q) = 0 height(σ(t1, . . . , tk)) = 1 + max {height(ti) | 1 ≤ i ≤ k}

Lecture I: Tree Automata

  • A. Maletti

· 51

slide-127
SLIDE 127

Tree Automaton

Definition Height of a tree t ∈ TΣ(Q): height(q) = 0 height(σ(t1, . . . , tk)) = 1 + max {height(ti) | 1 ≤ i ≤ k} Intuition Height of t = number of edges in a maximal path from the root to a leaf

Lecture I: Tree Automata

  • A. Maletti

· 51

slide-128
SLIDE 128

Tree Automaton

Theorem (Pumping Lemma) For every regular tree language L ⊆ TΣ there exists n ∈ N such that for every t ∈ L with height(t) ≥ n there exist contexts c, c′ ∈ CΣ and t′ ∈ TΣ such that t = c[c′[t′]] c′ = c[c′[c′ · · · c′[t′] · · · ]] ∈ L for any number of c′

❝ ❝′ t′ ❝ ❝′ ✳ ✳ ✳ ❝′ t′

Lecture I: Tree Automata

  • A. Maletti

· 52

slide-129
SLIDE 129

Tree Automaton

Problem Complexity DTA TA Emptiness PTime PTime Finiteness PTime PTime Universality PTime ExpTime Inclusion PTime ExpTime Equivalence PTime ExpTime Membership in logDCFL logCFL

Lecture I: Tree Automata

  • A. Maletti

· 53

slide-130
SLIDE 130

Literature

Textbooks COMON, DAUCHET et al. Tree Automata — Techniques and Applications http://tata.gforge.inria.fr/, 2008 GÉCSEG, STEINBY Tree Languages Handbook of Formal Languages, vol. 3, Springer, 1997 GÉCSEG, STEINBY Tree Automata Akadémiai Kiadó, 1984

Lecture I: Tree Automata

  • A. Maletti

· 54