[PDF] - Polynomial learning from Laurent Miclet, Jose Oncina and Tim PDF Document

SLIDE 1

1

Grammatical Inference 2005

c d l h

1

Polynomial learning from positive and negative examples

2

Grammatical Inference 2005

c d l h

2

Acknowledgements

Laurent Miclet, Jose Oncina

and Tim Oates for previous versions of these slides.

Rafael Carrasco, Paco Casacuberta, Rémi

Eyraud, Philippe Ezequel, Henning Fernau, Thierry Murgue, Franck Thollard, Enrique Vidal, Frédéric Tantini,...

List is necessarily incomplete. Excuses

to those that have been forgotten. http://eurise.univ-st-etienne.fr/~cdlh/slides

3

Grammatical Inference 2005

c d l h

3

Outline

1. The problem
2. Notations
3. Models
4. Proof techniques
5. Conclusion

4

Grammatical Inference 2005

c d l h

4

1 The problem:

In a general way to learn a language

(belonging to some class L) from examples and perhaps from:

– counter-examples – queries to an oracle – specific knowledge

Once the program is written we would

like:

– to say it is correct – to prove that no correct program can be written

5

Grammatical Inference 2005

c d l h

5

What does ‘correct’ mean?

We need a goal:

– L is a target (unknown). The harder L is the harder it is going to be to learn.

Learn what?

– find a representation of L – find some reasonable approximation of L

(what is a reasonable approximation?)

6

Grammatical Inference 2005

c d l h

6

Representation of L

We are going to have to fix some

representation of L;

We denote by r(L) this ideal

representation of L;

And

∫r(L)∫ is the size of this representation;

Or

at least some polynomial measure of the number of bits needed to encode r(L).

SLIDE 2

2

7

Grammatical Inference 2005

c d l h

7

How long can it take?

Ideally:

with p(∫r(L)∫) examples we are sure to learn/find...

Interesting:

with p(∫r(L)∫) examples drawn according to some distribution D, we will be nearly sure of finding a grammar/ classifier that will be nearly always right… according to D (PAC model: Valiant 84)

8

Grammatical Inference 2005

c d l h

8

What about efficiency?

We can try to bound

– global time – update time – errors before converging – queries – good examples needed

9

Grammatical Inference 2005

c d l h

9

2 General Notations

C H r(L) h r(L)≡h r(L)≈h

10

Grammatical Inference 2005

c d l h

10

The examples

Σ*

1 x

r( L)

x

r(L )

11

Grammatical Inference 2005

c d l h

11

The classes C and H

sets of examples
representations of these sets
the computation of r(L)(x)

(and h(x)) must take place in time polynomial in ⏐x⏐

12

Grammatical Inference 2005

c d l h

12

How do we consider a finite set?

Σ* D Σn D≤n Pr<ε

SLIDE 3

3

13

Grammatical Inference 2005

c d l h

13

3 Some models/paradigms

Identification in the limit
PAC learnability
PAC predictability
Learning with a teacher

14

Grammatical Inference 2005

c d l h

14

and...

– Identification in the limit with probability 1 – Identification PAC – Simple PAC – PAC Simple – Different teaching models – …

15

Grammatical Inference 2005

c d l h

15

Protocol

We have a presentation of some language L:

∀x∈L, x

appears in the presentation (learning from text : positive presentation)

∀x∈Σ*

<x, L(x)> appears in the presentation (learning from informant)

3.1 Identification in the limit (Gold 67,78)

16

Grammatical Inference 2005

c d l h

16

x1 x2 h1 h2 xn hn xi hi ≡ hn … ≡ r(L)

C is identifiable in the limit iff ∀L∈C, ∀ presentation

17

Grammatical Inference 2005

c d l h

17

Main results (Gold 67)

No

super-finite class is identifiable from text;

any

recursively denumerable class is identifiable from an informant.

18

Grammatical Inference 2005

c d l h

18

Main results in GI

All well known classes of

languages can be identified from complete presentations

No usual class of languages

can be identified from positive presentations

SLIDE 4

4

19

Grammatical Inference 2005

c d l h

19

3.2 PAC learning

(Valiant 84, Pitt 89)

C a set of languages
H a set of hypothesis
ε>0 and δ >0
L∈C
h∈H

20

Grammatical Inference 2005

c d l h

20

h is AC (approximately correct)* iff PrD[h(x)≠L(x)]< ε

* For some specific ε

21

Grammatical Inference 2005

c d l h

21

f h

Errors: we want (1(L)⊕1(h))<ε

22

Grammatical Inference 2005

c d l h

22

h is PAC* (probably approximately correct) iff PrD[h(x)≠L(x)]<ε with probability at least 1-δ

* For some specific ε and δ

23

Grammatical Inference 2005

c d l h

23

The oracle EX

The examples may cost, but…
(X, D) set of examples.
Denote by n

the size of an example.

EX(L,D) returns in time at most

O(n) a pair <x,L(x)>.

simplifying… EX

24

Grammatical Inference 2005

c d l h

24

The class C is PAC learnable by H iff there exists an algorithm (maybe probabilistic) a that uses EX to

btain

∀ε>0 and δ>0, ∀L∈C and for any distribution D

ver Σ*, a PAC hypothesis h ∈ H.

SLIDE 5

5

25

Grammatical Inference 2005

c d l h

25

The class C is polynomially PAC-learnable by H if C is PAC-learnable by H and if for any L∈C, a returns a PAC solution in time polynomial in 1/ε, 1/δ, ∫r(L)∫, n.

26

Grammatical Inference 2005

c d l h

26

3.3 PAC Prediction

The class C is polynomially PAC-predictable if there is a class H such that C is polynomially PAC-learnable by H.

27

Grammatical Inference 2005

c d l h

27

Some observations

Different variants

– PAC-identifiable: ε=0 – EX-pos, EX-neg

the case C

is PAC-learnable (by C): this is the usual case for positive results, but is not that useful in the negative case.

28

Grammatical Inference 2005

c d l h

28

PAC and GI

PAC learning DFA is still an open

problem but it is believed to be impossible because

– intractability of minimum consistency problem (Gold 78) – hardness

f

prediction due to cryptographic limitations (Kearns & Valiant 89) – hardness of learning with equivalence queries (Pitt 89, Angluin 87)

29

Grammatical Inference 2005

c d l h

29

3.4 Active Learning

Idea:

the learner can interrogate a master (an

racle)

the

racle

must answer correctly the oracle may choose the worse of the correct answers

30

Grammatical Inference 2005

c d l h

30

Active learning and GI

see the 2 lectures on the subject with poor queries, cannot learn anything with strong queries, can learn DFA

SLIDE 6

6

31

Grammatical Inference 2005

c d l h

31

3.5 Learning from a Teacher

Idea:

– the teacher can choose some good examples. – All examples are given at the beginning. – To avoid cheating (collusion) these examples will be mixed with others, less useful.

32

Grammatical Inference 2005

c d l h

32

Intermediate Model

Identification from a characteristic sample

Algorithm must be polynomial and ...

… every concept admits a polynomial characteristic sample

Related to learning from a teacher: a

set of models for the harder classes. Goldman, Mathias, ...

33

Grammatical Inference 2005

c d l h

33

Identification in the limit from polynomial time and data

1) Given a sample <X+, X->, of size m, ϕ returns h in H consistent with <X+, X-> in time in O(p(m)). 2) For any r(L) of size n, there exists a characteristic sample <CS+, CS-> of size at most q(n), with which, given <X+, X->, with CS+⊆X+, CS-⊆X-, ϕ returns h equivalent to f.

34

Grammatical Inference 2005

c d l h

34

Identification in the limit from polynomial time and data.

a

h ≡ L

f size q(∫r(L)∫)

<X+, X-> h [in time p(║X+║+║X-║)] <CS+,CS-> ⊆ <X+,X-> L

a

35

Grammatical Inference 2005

c d l h

35

A theorem by Gold (1978)

DFA are identifiable in the

limit from polynomial time and data

alternative

results and algorithms:

– Trakhenbrot & Barzdin 73 – Oncina & García 92 – Lang 92

36

Grammatical Inference 2005

c d l h

36

By morphisms the result may extend to:

Even linear grammars (Takada 88 &

94; Sempere & García 94, Mäkinen 96)

Total

subsequential functions (Oncina, García & Vidal 93 )

Context-free

grammars from skeletons (Sakakibara 90)

Tree automata (Knuutila 94)

SLIDE 7

7

37

Grammatical Inference 2005

c d l h

37

4 Hardness proofs

– algorithmic proofs – ‘information theoretic’ proofs

38

Grammatical Inference 2005

c d l h

38

4.1 Algorithmic proofs

We prove that learning some

grammar would solve some hard problem.

Usually:
RSA
RP-complete problem

39

Grammatical Inference 2005

c d l h

39

4.2 ‘Information theoretic’ proofs

We prove that there cannot be

enough information to learn.

Examples

– Approximate fingerprints, Angluin – Polynomial characteristic sets, cdlh

40

Grammatical Inference 2005

c d l h

40

Negative Results

For any alphabet Σ of size at

least 2, the following classes are not identifiable in the limit from polynomial time and data (cdlh 97):

– CFG(Σ), Context-Free Grammars; – LIN(Σ), Linear Grammars; – NFA(Σ), Non-Deterministic Finite Automata.

41

Grammatical Inference 2005

c d l h

41

Proof

Let G

be a class for which the equivalence problem (g1≡g2?) is undecidable.

Then for any polynomial p()

there exist g1 and g2 inseparable by strings

f

length <p(∫g1∫ +∫g2∫).

42

Grammatical Inference 2005

c d l h

42

Suppose

g1 and g2 are learnable with polynomial characteristic samples CS1 and CS2.

What grammar (function) will

be inferred from CS1∪CS2?

SLIDE 8

8

43

Grammatical Inference 2005

c d l h

43

Conclusion

For identification in the limit

from a complete presentation

– nearly everything is inferable

For identification in the limit

from a positive presentation

– nearly nothing is inferable

For PAC-prediction

– nearly nothing is inferable

44

Grammatical Inference 2005

c d l h

44

State of the art

DFA Context-free Identification in the limit yes yes PAC no no

Poly. Identif.

yes no Simple PAC yes ?

45

Grammatical Inference 2005

c d l h

45

Open problems

Minimum Description Length provides

an alternative convergence principle. Relate it to Identification in the limit.

Relate identification in the limit as

defined by Gold in 78 and in 67.

Improve the results of Honavar

& Parekh 98.