Learning Tree to Word Transducers LATA 2014 Aur elien Lemay joint - - PowerPoint PPT Presentation

learning tree to word transducers
SMART_READER_LITE
LIVE PREVIEW

Learning Tree to Word Transducers LATA 2014 Aur elien Lemay joint - - PowerPoint PPT Presentation

Learning Tree to Word Transducers LATA 2014 Aur elien Lemay joint work with: Gr egoire Laurence Joachim Niehren Slawek Staworko Marc Tommasi March 11, 2014 Aur elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11,


slide-1
SLIDE 1

Learning Tree to Word Transducers

LATA 2014 Aur´ elien Lemay

joint work with: Gr´ egoire Laurence Joachim Niehren Slawek Staworko Marc Tommasi

March 11, 2014

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 1 / 32

slide-2
SLIDE 2

Learning Tree Transductions

Transforming structured datas

Example of XSLT transformation : from XML to XHTML

Many applications, many formalisms... requires some expertise One solution : infer the transformation

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 2 / 32

slide-3
SLIDE 3

Learning Subsequential Transducer

Learning Subsequential Transducer[OncinaGarciaVidal93]

Subsequential transducers are learnable from examples with polynomial time and data (Gold Model [Gold78]) Two main ideas: Onward normal form [Choffrut79] : produce the output as soon as possible 1 a/ε a/b 1 a/b a/ε Two subsequential transducers for τ(a2n) = bn State merging algorithm : OSTIA [OncinaGarciaVidal93]

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 3 / 32

slide-4
SLIDE 4

Extensions of OSTIA - two learnable classes

Rational Functions [BoiretLemayNiehren12]

◮ Represented by Subsequential transducers w. deterministic look-ahead ◮ Normal form (inspired by bimachines [ReteunauerSchutzenberger92]) ◮ Learning algorithm ≃ learn the look-ahead, then apply OSTIA

Top-Down Tree-to-Tree Transducers [LemayManethNiehren11]

◮ Earliest normal form [EngelfrietManethSeidl09] : earliest production

(produce as ’up’ as possible),

◮ Myhill-Nerode kind of theorem in [LemayManethNiehren11] ◮ Learning based on a state merging algorithm Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 4 / 32

slide-5
SLIDE 5

Toward learning MSO tree transformations ?

MSO tree Transformation [Courcelle92] : an interesting target for learning tree transformation !

The big picture

MSO tree transformations ≃ Macro Tree Transducers w. regular look-ahead (MTT R) [EngelfrietManeth03] ≃ Top-Down + Concatenation + Look-ahead Top-Down Tree transducers : learnable Look-ahead : learnable

◮ not extended to trees yet

Concatenation in the output : ?

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 5 / 32

slide-6
SLIDE 6

Outline

1

Tree to Word Transducers

2

Normal Form

3

A Myhill-Nerode Theorem

4

learning Algorithm

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 6 / 32

slide-7
SLIDE 7

Outline

1

Tree to Word Transducers

2

Normal Form

3

A Myhill-Nerode Theorem

4

learning Algorithm

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 7 / 32

slide-8
SLIDE 8

Tree to Word Transducers - An example

XML-like Serialization

Axiom : q(x0) q(f (x1, x2)) →< f > ·q(x1) · q(x2) < /f > q(g(x1, x2)) →< g > ·q(x1) · q(x2) < /g > q(a) →< a/ > q(b) →< b/ > f g a b b

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 8 / 32

slide-9
SLIDE 9

Tree to Word Transducers - An example

XML-like Serialization

Axiom : q(x0) q(f (x1, x2)) →< f > ·q(x1) · q(x2) < /f > q(g(x1, x2)) →< g > ·q(x1) · q(x2) < /g > q(a) →< a/ > q(b) →< b/ > q f g a b b

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 8 / 32

slide-10
SLIDE 10

Tree to Word Transducers - An example

XML-like Serialization

Axiom : q(x0) q(f (x1, x2)) →< f > ·q(x1) · q(x2) < /f > q(g(x1, x2)) →< g > ·q(x1) · q(x2) < /g > q(a) →< a/ > q(b) →< b/ > <f> q g a b q b </f>

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 8 / 32

slide-11
SLIDE 11

Tree to Word Transducers - An example

XML-like Serialization

Axiom : q(x0) q(f (x1, x2)) →< f > ·q(x1) · q(x2) < /f > q(g(x1, x2)) →< g > ·q(x1) · q(x2) < /g > q(a) →< a/ > q(b) →< b/ > <f> q g a b <b/> </f>

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 8 / 32

slide-12
SLIDE 12

Tree to Word Transducers - An example

XML-like Serialization

Axiom : q(x0) q(f (x1, x2)) →< f > ·q(x1) · q(x2) < /f > q(g(x1, x2)) →< g > ·q(x1) · q(x2) < /g > q(a) →< a/ > q(b) →< b/ > <f> <g> q a q b </g> <b/> </f>

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 8 / 32

slide-13
SLIDE 13

Tree to Word Transducers - An example

XML-like Serialization

Axiom : q(x0) q(f (x1, x2)) →< f > ·q(x1) · q(x2) < /f > q(g(x1, x2)) →< g > ·q(x1) · q(x2) < /g > q(a) →< a/ > q(b) →< b/ > <f> <g> <a/> <b/> </g> <b/> </f>

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 8 / 32

slide-14
SLIDE 14

Tree to word Transducers - Presentation

Axiom: u0 · q(x0) · u1 Rules: q(f (x1, x2)) → u0 · q1(x1) · u1 · q2(x2) · u2

Three Restrictions

Deterministic Linear (no copy) Ordered (no swap) Deterministic Sequential Tree to Words (STW)

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 9 / 32

slide-15
SLIDE 15

Outline

1

Tree to Word Transducers

2

Normal Form

3

A Myhill-Nerode Theorem

4

learning Algorithm

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 10 / 32

slide-16
SLIDE 16

Normal Form

Earliest STW : produce as soon as possible Example transformation : count the number of symbols. τcount(f (a, f (a, b))) = #####

An STW for τcount

Axiom: q q(f ) → #q(x1)q(x2) q(a) → # q(b) → # Not earliest ! At least one ’#’ could be output from the beginning.

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 11 / 32

slide-17
SLIDE 17

Normal Form

Another STW for τcount : Axiom: #q q(f ) → #q(x1)#q(x2) q(a) → ε q(b) → ε

Earliest (Rule 1)

Produce as ’up’ as possible

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 12 / 32

slide-18
SLIDE 18

Normal Form

We want a unique normal form : Do we want : q(f ) → #q(x1)#q(x2)

  • r q(f ) → ##q(x1)q(x2)

(or another choice ?)

Earliest - Rule 2

produce as ’left’ as possible

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 13 / 32

slide-19
SLIDE 19

Normal Form

Earliest STW (eSTW) : produce as ’up’ and as ’left’ as possible

Theorem [LaurenceLemayNiehrenStaworkoTommasi11]

For any STW, there exists an equivalent unique minimal eSTW Possibly of exponential size

The minimal eSTW of τcount

Axiom: #q q(f ) → ##q(x1)q(x2) q(a) → ε q(b) → ε

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 14 / 32

slide-20
SLIDE 20

Outline

1

Tree to Word Transducers

2

Normal Form

3

A Myhill-Nerode Theorem

4

learning Algorithm

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 15 / 32

slide-21
SLIDE 21

A Myhill-Nerode Theorem for STW

constructive algorithm for can(τ) (minimal eSTW for τ) builds for each input path p a τp p ≃ p′ iff τp = τp′

Myhill-Nerode Theorem for STW

τ is represented by a STW ⇔ ≃ is of Finite Index ⇔ can(τ) is the minimal eSTW of τ

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 16 / 32

slide-22
SLIDE 22

Building Axiom

Axiom : lcp(range(τ)) · qε · lcs′(range(τ)) lcp : longest common prefix lcs’ : longest common suffix (minus what is in lcp) For τcount : lcp(range(τcount)) = # lcs′(range(τcount)) = ε Axiom : #qε

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 17 / 32

slide-23
SLIDE 23

Building τε

Axiom : #qε We define τε: For any t, τε(t) = #−1τcount(t)

Defining τε

a → ε b → ε f (a, a) → #2 ... f (f (a, b), a) → #4 ... τε(t) : #|t|−1

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 18 / 32

slide-24
SLIDE 24

Building Rules for Leaf Symbols

Rules from state qε For leaf symbols : τε(a) → ε τε(b) → ε

Rules

qε(a) → ε qε(b) → ε

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 19 / 32

slide-25
SLIDE 25

Building Other Rules (1)

build the rule qε(f (x1, x2)) → u0 · q(f ,1) · u1 · q(f ,2) · u2 First, u0 = lcp({τε)(f (?, ?)))})

Compute u0 from τε(f (?, ?))

f (a, a) → #2 ... f (f (a, b), a) → #4 ... u0 = #2 qε(f (x1, x2)) → #2 · q(f ,1) · u1 · q(f ,2) · u2

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 20 / 32

slide-26
SLIDE 26

Building Other Rules (2)

qε(f (x1, x2)) → #2 · q(f ,1) · u1 · q(f ,2) · u2 To have τ(f ,1)(t), take the lcp of τε(f (t, ?))

Compute τ(f ,1)(a) from τε(f (a, ?))

τε(f (a, a)) = #2 τε(f (a, b)) = #2 τε(f (a, f (a, a)) = #4 ... τε(f (a, f (a, f (a, a)))) = #6 lcp(τε(f (a, ?))) = #2 = u0τ(f ,1)(a)u1 (guaranteed by earliesness) As u0 = #2, τ(f ,1)(a) = u1 = ε

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 21 / 32

slide-27
SLIDE 27

Building Other Rules (3)

Compute τ(f ,1)

τ(f ,1)(a) = ε τ(f ,1)(b) = ε τ(f ,1)(f (a, a)) = #2 ... τ(f ,1)(f (a, f (a, f (a, a)))) = #4 ... τ(f ,1) : t → #|t|−1

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 22 / 32

slide-28
SLIDE 28

State Equivalence

τε : t → #|t|−1 τ(f ,1) : t → #|t|−1 ε ≃ (f , 1) So : qε = q(f ,1) Similarly u2 = u1 = ε and (f , 2) ≃ ε

eSTW Can(τ)

Axiom : #qε qε(a) → ε qε(b) → ε qε(f (x1, x2)) → ##qεqε minimal eSTW for τcount

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 23 / 32

slide-29
SLIDE 29

Outline

1

Tree to Word Transducers

2

Normal Form

3

A Myhill-Nerode Theorem

4

learning Algorithm

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 24 / 32

slide-30
SLIDE 30

Learning Algo

Learning algorithm LearnSTW (S)

Essentially the same as the construction algorithm

◮ Input : a finite sample S ⊆ τ ◮ from each path p, compute Sp (approximation of τp) ◮ p ≃ p′ if Sp does not contradict with Sp′

LearnSTW (S) answers in polynomial time For any STW τ, there exists a sample CSτ of polynomial cardinality such that LearnSTW (S) = Can(τ) if CSτ ⊆ S

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 25 / 32

slide-31
SLIDE 31

Consistancy Problem

Consistancy Issues

Checking if there exists an STW consistent with a given sample is NP-complete. Idea : Encoding 1-in-3 SAT Problem 1-in-3 SAT : Variant of 3-SAT with exactly 1 litteral satisfied per clause NP-Complete [Schaefer78]

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 26 / 32

slide-32
SLIDE 32

Consistancy Problem

Example with a 4 literal formulas (v1 ∨ ¬v2 ∨ v3) ∧ (v2 ∨ ¬v3 ∨ v4) v1 ∨ ¬v2 ∨ v3 is encoded by: c v1

  • v2
  • v3
  • v4

T

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 27 / 32

slide-33
SLIDE 33

Consistancy Problem

The encoding of (v1 ∨ ¬v2 ∨ v3) ∧ (v2 ∨ ¬v3 ∨ v4) (v1 ∨ ¬v2 ∨ v3) : c(v1(•, ◦), v2(◦, •), v3(•, ◦), v4(◦, ◦)) → T (v2 ∨ ¬v3 ∨ v4) : c(v1(◦, ◦), v2(•, ◦), v3(◦, •), v4(•, ◦)) → T (v1 ∨ ¬v1) : c(v1(•, •), v2(◦, ◦), v3(◦, ◦), v4(◦, ◦)) → T (v2 ∨ ¬v2) : c(v1(◦, ◦), v2(•, •), v3(◦, ◦), v4(◦, ◦)) → T (v3 ∨ ¬v3) : c(v1(◦, ◦), v2(◦, ◦), v3(•, •), v4(◦, ◦)) → T (v4 ∨ ¬v4) : c(v1(◦, ◦), v2(◦, ◦), v3(◦, ◦), v4(•, •)) → T c(v1(◦, ◦), v2(◦, ◦), v3(◦, ◦), v4(◦, ◦)) → ε Idea: An eSTW that recognizes this sample encodes a solution of SAT

  • ne-in-three.

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 28 / 32

slide-34
SLIDE 34

Consistancy Problem

Unique STW solution

Axiom : q q(c(x1, . . . , xn)) → q1(x1) . . . qn(xn) qi(vi(x1, x2)) → qL

i (x1)qR i (x2)

qL/R

i

(◦) → ε if vi is true : qL

i (•) → T and qR i (•) → ε

if vi is false : qL

i (•) → ε and qR i (•) → T

Idea : one state qi per variable vi, and vi(•, ◦) produces T iff vi is true vi(◦, •) produces T iff vi is false

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 29 / 32

slide-35
SLIDE 35

Learning Theorem

Learning Theorem

The Class of STW represented by eSTW is learnable from examples with polynomial time and data with abstain. With abstain : the algorithm may not answer (non characteristic sample)

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 30 / 32

slide-36
SLIDE 36

Conclusion - Future Works

Results Sequential Tree to Word Transducers :

◮ A Myhill-Nerode Theorem ◮ learnable in a Gold-like Model

Future Works extension to non-ordered : good hope ! extension to non-linear : bad hope ... toward a learning algorithm for MSO tree transduction ?

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 31 / 32

slide-37
SLIDE 37

OSTIA [OncinaGarciaVidal93]

Example Input : S = {(ε, ε), (aa, b), (aaaa, bb)}

1 Align Input / output in an onward way, and build initial transducer

1 2 3 4 a/b a/ε a/b a/ε

2 Perform State merging in an ordered way

0, 2, 4 1, 3 a/b a/ε S characteristic for τ ⇒ OSTIA(S) : canonical subsequential of τ

Aur´ elien Lemay (INRIA Lille) Learning Tree to Word Transducers March 11, 2014 32 / 32