Learning k -reversible slides. Rafael Carrasco, Paco Casacuberta, - - PDF document

learning k reversible
SMART_READER_LITE
LIVE PREVIEW

Learning k -reversible slides. Rafael Carrasco, Paco Casacuberta, - - PDF document

Acknowledgements Laurent Miclet, Jose Oncina and Tim Oates for previous versions of these Learning k -reversible slides. Rafael Carrasco, Paco Casacuberta, Rmi Eyraud, Philippe Ezequel, Henning Fernau, Thierry Murgue, Franck


slide-1
SLIDE 1

1

1

Grammatical Inference 2006

c d l h

1

Learning k-reversible languages

2

Grammatical Inference 2006

c d l h

2

Acknowledgements

  • Laurent Miclet, Jose Oncina

and Tim Oates for previous versions of these slides.

  • Rafael Carrasco, Paco Casacuberta, Rémi

Eyraud, Philippe Ezequel, Henning Fernau, Thierry Murgue, Franck Thollard, Enrique Vidal, Frédéric Tantini,...

  • List is necessarily incomplete. Excuses

to those that have been forgotten. http://eurise.univ-st-etienne.fr/~cdlh/slides

3

Grammatical Inference 2006

c d l h

3

Outline

  • 1. Introduction
  • 2. Definitions
  • 3. The algorithm
  • 4. Properties

4

Grammatical Inference 2006

c d l h

4

  • 1. The problem
  • Gold:

identification from positive data

  • nly

is impossible for super-finite classes.

  • The problem thus is over-

generalisation.

5

Grammatical Inference 2006

c d l h

5

Avoid

  • Accumulation points;
  • Languages that do not have

tell-tale sets.

6

Grammatical Inference 2006

c d l h

6

  • 2. The k-reversible languages
  • The class was proposed by Angluin

(1982).

  • The class is identifiable in the

limit from text.

  • The class is composed by regular

languages that can be accepted by a DFA such that its reverse is deterministic with a look-ahead

  • f k.
slide-2
SLIDE 2

2

7

Grammatical Inference 2006

c d l h

7

Let A=(Σ, Q, I, F , δ) be a NFA, we denote by AT=(Σ, Q, F, I , δT) the reversal automaton with: δT(q,a)={q’∈Q: q∈δ(q’,a)}

8

Grammatical Inference 2006

c d l h

8

1 3 b 2 4 a b a a a a 1 3 b 2 4 a b a a a a A AT

9

Grammatical Inference 2006

c d l h

9

Some definitions

  • u

is a k-successor

  • f q

if │u│=k and δ(q,u)≠∅.

  • u is a k-predecessor of q if

│u│=k and δT(q,uT)≠∅.

  • λ

is 0-successor and 0- predecessor of any state.

10

Grammatical Inference 2006

c d l h

10

1 3 b 2 4 b a a a a A

  • aa is a 2-successor of 0 and

1 but not of 3.

  • a is a 1-successor of 3.
  • aa

is a 2-predecessor of 3 but not of 1.

a

11

Grammatical Inference 2006

c d l h

11

A NFA is deterministic with look-ahead k iff ∀q,q’∈Q: q≠q’ (q,q’∈I) ∨ (q,q’∈δ(q”,a))

(u is a k-successor of q) ∧ (v is a k-successor of q’)

u≠v

12

Grammatical Inference 2006

c d l h

12

Prohibited:

2 1 a a u u

│u│=k

slide-3
SLIDE 3

3

13

Grammatical Inference 2006

c d l h

13

Note

  • You must have intersection

free successor sets only for those states that have a non determinism issue!

14

Grammatical Inference 2006

c d l h

14

Example

This automaton is not deterministic with look-ahead 1 but is deterministic with look-ahead 2. The fact that states 1 and 2 have common successors is not a problem. 1 3 b 2 4 a b a a a a

15

Grammatical Inference 2006

c d l h

15

K-reversible automata

  • A

is k-reversible if A is deterministic and AT is deterministic with look-ahead k.

  • Example

1 b 2 b a a b 1 b 2 b a a b deterministic deterministic with look-ahead 1

16

Grammatical Inference 2006

c d l h

16

Violation of k-reversibility

  • Two states q, q’

violate the k-reversibility condition if

– they violate the deterministic condition: q,q’∈δ(q”,a);

  • r

– they violate the look-ahead condition:

  • q,q’∈F, ∃u∈Σk: u is k-predecessor of

both;

  • ∃u∈Σk, δ(q,a)=δ(q’,a) and u

is k- predecessor of both q and q’.

17

Grammatical Inference 2006

c d l h

17

3 Learning k-reversible automata

  • Key idea: the order in which

the merges are performed does not matter!

  • Just merge states that do not

comply with the conditions for k-reversibility.

18

Grammatical Inference 2006

c d l h

18

K-RL Algorithm (ak-RL)

Data: k∈N, X text sample A=PTA(X) While ∃q,q’ k-reversibility violators do A=merge(A,q,q’)

slide-4
SLIDE 4

4

19

Grammatical Inference 2006

c d l h

19

K-RL algorithm (ak-RL) (with partitions)

Data: k∈N, X text sample A0=PTA(X) π ={{q}:q∈Q} While ∃B,B’∈π k-reversibility violators do π= π-B-B’∪{B∪B’} A=A0/π

20

Grammatical Inference 2006

c d l h

20

Let X={a, aa, abba, abbbba}

a

λ

ab abb aa abbbb abbb abbbba abba

a b b b b a a a

k=2

Violators, for u= ba

21

Grammatical Inference 2006

c d l h

21

Let X={a, aa, abba, abbbba}

a

λ

ab abb aa abbbb abbb abba

a b b b b a a a

k=2

Violators, for u= bb

22

Grammatical Inference 2006

c d l h

22

Let X={a, aa, abba, abbbba}

a

λ

ab abb aa abbb abba

a b b b b a a

k=2

This automaton is 2 reversible. Note that with k=1 more merges are possible

23

Grammatical Inference 2006

c d l h

23

Properties (1)

  • ∀k≥0,

∀X, ak-RL(X) is a k- reversible language.

  • L(ak-RL(X)) is the smallest k-

reversible language that contains X.

  • The class k-RL

is identifiable in the limit from text.

24

Grammatical Inference 2006

c d l h

24

Properties (2)

  • A

regular language is k- reversible iff ∀u1,u2∈Σ*, ∀v∈Σk, ∃w∈Σ*: u1vw ∈ L ∧ u2vw ∈ L ⇒ (u1v)-1L=(u2v)-1L (any two strings who have a common suffix of length k can be ended, then the strings are Nerode-equivalent)

slide-5
SLIDE 5

5

25

Grammatical Inference 2006

c d l h

25

Properties (3)

  • L(ak-RL(X)) ⊂ L(a[k-1]-RL(X))
  • Remember we also had:

L(ak-

TSS(X)) ⊂ L(a[k-1]-TSS(X))

26

Grammatical Inference 2006

c d l h

26

Properties (4)

The time complexity is in O(k║X║3). The space complexity is in O(║X║). The algorithm can be made incremental.

27

Grammatical Inference 2006

c d l h

27

Properties (4) Polynomial aspects

  • Polynomial update time;
  • Polynomial

characteristic samples?

  • Polynomial

number

  • f

mind changes?

  • Polynomial number of implicit

prediction errors?

28

Grammatical Inference 2006

c d l h

28

Extensions

  • Sakakibara

built an extension for context-free grammars whose tree language is k-reversible.

  • Marion

& Besombes propose an extension to tree languages.

  • Different authors propose to learn

these automata and then estimate the probabilities as an alternative to learning stochastic automata.

29

Grammatical Inference 2006

c d l h

29

Exercises

  • Construct a language L

that is not k-reversible, ∀k≥0.

  • Prove that the class of k-

reversible languages is not identifiable from text.

  • Run ak-RL on X={aa, aba, abb,

abaaba, baaba} for k=0,1,2,3

30

Grammatical Inference 2006

c d l h

30

Solution (idea)

  • Lk={ai: i≤k}
  • Then for each k: Lk

is k- reversible but not k-1- reversible.