Relevant Representations for the Inference of Rational Stochastic - - PowerPoint PPT Presentation

relevant representations for the inference of rational
SMART_READER_LITE
LIVE PREVIEW

Relevant Representations for the Inference of Rational Stochastic - - PowerPoint PPT Presentation

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion Relevant Representations for the Inference of Rational Stochastic Tree Languages cois Denis 1 Edouard Gilbert 2 Amaury Habrard 1 Fran


slide-1
SLIDE 1

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Relevant Representations for the Inference of Rational Stochastic Tree Languages

Fran¸ cois Denis1 Edouard Gilbert2 Amaury Habrard1 Fa¨ ıssal Ouardi1 Marc Tommasi2

1Laboratoire d’Informatique Fondamentale de Marseille (LIF)

CNRS, Aix-Marseille Universit´ e, France

2Laboratoire d’Informatique Fondamentale de Lille (L.I.F.L.), INRIA

and ´ E.N.S. Cachan, France

ICGI 2008

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-2
SLIDE 2

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Outline

1 The Basic Problem 2 A Canonical Linear Representation for Rational Tree Series 3 Contributions

Normalization of the Model as a Generative Model Strongly Consistent Model Unranked Trees

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-3
SLIDE 3

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Outline

1 The Basic Problem 2 A Canonical Linear Representation for Rational Tree Series 3 Contributions

Normalization of the Model as a Generative Model Strongly Consistent Model Unranked Trees

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-4
SLIDE 4

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Trees

F = F0 ∪ F1 ∪ · · · ∪ Fp: a ranked alphabet Fm: function symbols of arity m. T(F): all the trees constructed from F. Example: F = {f (·, ·), a} ; f (a, f (a, a)) ∈ T(F). f a f a a

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-5
SLIDE 5

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Stochastic Tree Languages

Stochastic tree language: Probability distribution over T(F) p : T(F) → R for any t ∈ T(F), 0 ≤ p(t) ≤ 1 and

  • t∈T(F) p(t) = 1.

Formal power tree series over T(F) r : T(F) → R. Notation: RT(F) (vector space).

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-6
SLIDE 6

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

A Basic Problem in Probabilistic Grammatical Inference

The Problem Data t1, . . . , tn ∈ T(F) independently drawn according to a fixed unknown stochastic tree language p. Goal Infer an estimate of p in some class of probabilistic models. Probabilistic models Probabilistic tree automata Linear representations of rational tree series

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-7
SLIDE 7

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Probabilistic Tree Automata

A distribution over T(F) according to a PA with one state Aα : ∆α = {q α → a, q 1−α → f (q, q)}, τ(q) = 1, 0 ≤ α ≤ 1 pα(f (a, f (a, a))) = α3(1 − α)2 Less simple than in the word case pα is a stochastic language iff α ≥ 1/2. Is it decidable whether a PA defines a stochastic language? The average tree size: 1/(2α − 1). Unbounded if α = 1/2. It is polynomially decidable whether a PA defines a stochastic language with bounded average size.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-8
SLIDE 8

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Linear Representations of Rational Tree Languages

A series r ∈ RT(F) is rational iff there exists a triple (V , µ, λ): V is a finite dimensional vector space over R, µ maps any f ∈ Fp to a p-linear mapping µ(f ) ∈ L(V p; V ), λ is a linear form V → R, r(t) = λµ(t), where µ(f (t1, . . . , tp)) = µ(f )(µ(t1), . . . , µ(tp)). Example V = R and let e1 = 0 a basis of R, µ(a) = αe1, µ(f )(e1, e1) = (1 − α)e1, λ(e1) = 1. λµ(f (a, f (a, a))) = α3(1 − α)2

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-9
SLIDE 9

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Rational Stochastic Tree Languages

Stochastic languages A rational stochastic tree language (RSTL) is a stochastic language that has a linear representation. Every stochastic language computed by a probabilistic automaton is rational. Some RSTL cannot be computed by a probabilistic automaton. It is undecidable whether a linear representation represents a stochastic language. A RSTL can be equivalently represented by a weighted tree automaton, minimal in the number of states (vector space).

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-10
SLIDE 10

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Outline

1 The Basic Problem 2 A Canonical Linear Representation for Rational Tree Series 3 Contributions

Normalization of the Model as a Generative Model Strongly Consistent Model Unranked Trees

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-11
SLIDE 11

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Word Languages: The Notion of Residual Languages

Languages: L ⊆ Σ∗, u ∈ Σ∗ u−1L = {v ∈ Σ∗|uv ∈ L} Series: r ∈ RT(F), u ∈ Σ∗ ˙ ur(v) = r(uv) Residual language is a key notion for inference because: residual languages are intrinsic components they are observable on samples they yield canonical representations.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-12
SLIDE 12

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Contexts

$: a zero arity function symbol not in F0. A context is an element of T(F ∪ {$}) s.t. $ appears exactly once. C(F): all contexts over F. c[t]: the tree obtained by substituting $ by t. Example: c = f (a, $) f a $ c[f (a, a)] = f (a, f (a, a)) f a f a a

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-13
SLIDE 13

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

An Algebraic Characterization of Rational Series

Contexts operate on tree series Let c ∈ C(F). Define ˙ c : RT(F) → RT(F) by ˙ cr(t) = r(c[t]). Example c = f (a, $), t = f (a, a), ˙ cr(t) = r(f (a,f (a, a))). Let r ∈ T(F), consider Wr = [{˙ cr|c ∈ C(F)}] ⊆ RT(F) the vector subspace of RT(F) spanned by the series ˙ cr. Theorem: r is rational iff the dimension of Wr is finite.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-14
SLIDE 14

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

The Canonical Linear Representation of Rational Series

Wr = [{˙ cr|c ∈ C(F)}] ; W ∗

r dual space of Wr

No natural linear representation of r on Wr

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-15
SLIDE 15

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

The Canonical Linear Representation of Rational Series

Wr = [{˙ cr|c ∈ C(F)}] ; W ∗

r dual space of Wr

No natural linear representation of r on Wr T(F) is naturally embedded in W ∗

r :

t → t s.t. t(˙ cr) = r(c[t])

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-16
SLIDE 16

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

The Canonical Linear Representation of Rational Series

Wr = [{˙ cr|c ∈ C(F)}] ; W ∗

r dual space of Wr

No natural linear representation of r on Wr T(F) is naturally embedded in W ∗

r :

t → t s.t. t(˙ cr) = r(c[t]) {t|t ∈ T(F)} spans W ∗

r

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-17
SLIDE 17

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

The Canonical Linear Representation of Rational Series

Wr = [{˙ cr|c ∈ C(F)}] ; W ∗

r dual space of Wr

No natural linear representation of r on Wr T(F) is naturally embedded in W ∗

r :

t → t s.t. t(˙ cr) = r(c[t]) {t|t ∈ T(F)} spans W ∗

r

the canonical linear representation of r: (W ∗

r , µ, λ) where µ(t) = t and λ = r (W ∗ r ∗ = Wr)

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-18
SLIDE 18

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

F = {f (, ), a}, τ(q) = 1, pα : q α → a, q 1−α → f (q, q)

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-19
SLIDE 19

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

F = {f (, ), a}, τ(q) = 1, pα : q α → a, q 1−α → f (q, q) Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-20
SLIDE 20

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-21
SLIDE 21

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

Oracle: Is a = 0? i.e. for every context c, p(c[a]) = 0?

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-22
SLIDE 22

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

Oracle: Is a = 0? i.e. for every context c, p(c[a]) = 0? Answer: NO, consider c = $. Let B = {a}.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-23
SLIDE 23

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

Oracle: Is f (a, a) colinear to a? i.e. ∃α, for every context c, p(c[f (a, a)]) = αp(c[a])? Let B = {a}.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-24
SLIDE 24

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

Oracle: Is f (a, a) colinear to a? i.e. ∃α, for every context c, p(c[f (a, a)]) = αp(c[a])? Answer: NO, consider c1 = $ and c2 = f (a, $). Let B = {a, f (a, a)}.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-25
SLIDE 25

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

Oracle: Is f (a, f (a, a)) colinear to a, f (a, a)? Let B = {a, f (a, a)}.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-26
SLIDE 26

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

Oracle: Is f (a, f (a, a)) colinear to a, f (a, a)? Answer: YES, f (a, f (a, a)) = −54 24 × 34 a + 59 24 × 32 f (a, a). Let B = {a, f (a, a)}.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-27
SLIDE 27

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

Oracle: Is f (f (a, a), a) colinear to a, f (a, a)? Let B = {a, f (a, a)}.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-28
SLIDE 28

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

Oracle: Is f (f (a, a), a) colinear to a, f (a, a)? Answer: YES, f (a, f (a, a)) = −54 24 × 34 a + 59 24 × 32 f (a, a). Let B = {a, f (a, a)}.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-29
SLIDE 29

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

Oracle: Is f (f (a, a), f (a, a)) colinear to a, f (a, a)? Let B = {a, f (a, a)}.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-30
SLIDE 30

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

Let p = 2p2/3 − p3/4 :

t p(t) = 1 and ∀t, p(t) ≥ 0.

p(a) = 7 12, p(f (a, a)) = 269 1728, p(f (a, f (a, a))) = p(f (f (a, a), a)) = 9823 248832, . . .

Oracle: Is f (f (a, a), f (a, a)) colinear to a, f (a, a)? Answer: YES, f (f (a, a), f (a, a)) = −3186 28 × 36 a + 2617 28 × 34 f (a, a). Let B = {a, f (a, a)}.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-31
SLIDE 31

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Building the Canonical Linear Representation

p = 2p2/3 − p3/4 B = {a, f (a, a)}.

µ(a) = a µ(f )(a, a) = f (a, a) µ(f )(a, f (a, a)) = −54 24 × 34 a + 59 24 × 32 f (a, a) µ(f )(f (a, a), a) = −54 24 × 34 a + 59 24 × 32 f (a, a) µ(f )(f (a, a), f (a, a)) = −3186 28 × 36 a + 2617 28 × 34 f (a, a) λ(a) = p(a) =

7 12; λ(f (a, a)) = p(f (a, a)) = 269 1728

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-32
SLIDE 32

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Algorithm DEES; Independence Test

S a finite sample i.i.d. from p; B current basis; s vector candidate ∀αt ∈ R, s =

  • t∈B

αtt ≃

  • c:∃t c[t]∈S
  • |pS(c[s]) −
  • t∈B

αtpS(c[t])| ≤ ǫ

  • has no solution.

Take ǫ = |S|−γ where γ ∈]0, 1/2[ (VC bounds).

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-33
SLIDE 33

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Properties of DEES

Theorem [F. Denis and A. Habrard, ALT’07] DEES identifies the correct basis in the limit with probability one and the parameters converge to the correct ones in O(|S|−1/2). But ... In the model output, the states may not define stochastic languages. The parameters are not normalized. Before convergence, the model output may not define a stochastic language.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-34
SLIDE 34

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion Normalization of the Model as a Generative Model Strongly Consistent Model Unranked Trees

Outline

1 The Basic Problem 2 A Canonical Linear Representation for Rational Tree Series 3 Contributions

Normalization of the Model as a Generative Model Strongly Consistent Model Unranked Trees

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-35
SLIDE 35

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion Normalization of the Model as a Generative Model Strongly Consistent Model Unranked Trees

The Normalization of the Model

q → q0, 7/12 + q1, 269/1728 q0 → a, 1 + f (q0, q1), −54 2434 + f (q1, q0), −54 2434 + f (q1, q1), −3186 2836 q1 → f (q0, q0), 1 + f (q0, q1), 59 2432 + f (q1, q0), 59 2432 + f (q1, q1), 2617 2834

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-36
SLIDE 36

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion Normalization of the Model as a Generative Model Strongly Consistent Model Unranked Trees

The Normalization of the Model

q → q0, 7/12 + q1, 269/1728 q0 → a, 1 + f (q0, q1), −54 2434 + f (q1, q0), −54 2434 + f (q1, q1), −3186 2836 q1 → f (q0, q0), 1 + f (q0, q1), 59 2432 + f (q1, q0), 59 2432 + f (q1, q1), 2617 2834

Theorem For any rational stochastic language, there exists a normalized representation with a basis chosen to ensure that: Each state defines a stochastic language. The weights of the transitions are normalized.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-37
SLIDE 37

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion Normalization of the Model as a Generative Model Strongly Consistent Model Unranked Trees

After Renormalization

∀ state lhs: Sum of the transition weights is one. ∀ pair (state-lhs,symbol): Sum of the transition weights ≥ 0.

q → q0, 1 q0 → a, 7 12 + f (q0, q0), −269 50 + f (q0, q1), 259 50 + f (q1, q0), 259 50 , + f (q1, q1), −1369 300 q1 → a, 269 444 + f (q0, q0), −3024 925 + f (q0, q1), 2664 925 + f (q1, q0), 2664 925 + f (q1, q1), −23273 11100

Efficient propagative method for computing the normalization. Still negative weights → specific generation algorithm.

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-38
SLIDE 38

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion Normalization of the Model as a Generative Model Strongly Consistent Model Unranked Trees

Notion of Strong Consistency

Rational Stochastic Tree Language Strongly Consistent Bounded average tree size:

  • t

p(t)|t| < ∞ Theorem For a strongly consistent RSTL, the spectral radius of the ”expectation matrix” A taken from the normalized representation is strictly less than 1 (ρ(A) < 1).

Errata: Some hypotheses are missing in Proposition 1 see http://hal.archives-ouvertes.fr/hal-00293511/en

(the series

t∈T(F) pi(t) and t∈T(F) pi(t)|t| have to be absolutely

convergent)

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-39
SLIDE 39

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion Normalization of the Model as a Generative Model Strongly Consistent Model Unranked Trees

Adapting the Framework to Unranked Trees

Unranked Trees f a a f a f Unranked tree series ⇔ Bijection ⇔ Equivalence Ranked Trees F0 = {a, f } F2 = {@} @ @ @ a f a @ @ f f a Ranked tree series All the inference results apply: Convert the data and use DEES

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages

slide-40
SLIDE 40

The Basic Problem A Canonical Linear Representation for Rational Tree Series Contributions Conclusion

Conclusion: Learning RSTL from i.i.d. samples

DEES may output irrelevant representations. Our contributions:

Existence and construction of a normalized representation. Algorithm for generating trees from the distribution. Strong consistency. Application to unranked trees.

⇒ When the models do not define stochastic languages, a distribution can be extracted and controlled if ρ(A) < 1. ⇒ A prototype software is being developed (Piccata).

  • F. Denis, E. Gilbert, A. Habrard, F. Ouardi and M. Tommasi

Representations for Rational Stochastic Tree Languages