Learning probabilistic finite automata Colin de la Higuera - PowerPoint PPT Presentation

Learning probabilistic finite automata Colin de la Higuera University of Nantes Nantes, November 2013 1

Acknowledgements � Laurent Miclet, Jose Oncina, Tim Oates, Rafael Carrasco, Paco Casacuberta, Rémi Eyraud, Philippe Ezequel, Henning Fernau, Thierry Murgue, Franck Thollard, Enrique Vidal, Frédéric Tantini,... � List is necessarily incomplete. Excuses to those that have been forgotten http://pagesperso.lina.univ-nantes.fr/~cdlh/slides/ Chapters 5 and 16 Nantes, November 2013 2

Outline PFA 1. Distances between distributions 2. FFA 3. Basic elements for learning PFA 4. ALERGIA 5. MDI and DSAI 6. Open questions 7. Nantes, November 2013 3

1 PFA Probabilistic finite (state) automata Nantes, November 2013 4

Practical motivations (Computational biology, speech recognition, web services, automatic translation, image processing … ) � A lot of positive data � Not necessarily any negative data � No ideal target � Noise Nantes, November 2013 5

The grammar induction problem, revisited � The data consists of positive strings, «generated» following an unknown distribution � The goal is now to find (learn) this distribution � or the grammar/automaton that is used to generate the strings Nantes, November 2013 6

Success of the probabilistic models � n -grams � Hidden Markov Models � Probabilistic grammars Nantes, November 2013 7

b 1 2 a 1 1 1 a 2 2 3 1 a 4 b 2 b 1 3 3 2 4 DPFA : Deterministic Probabilistic Finite Automaton Nantes, November 2013 8

1 b 2 a 1 1 1 a 2 2 3 1 a 4 b 2 b 1 3 3 2 4 1 1 1 2 3 1 × × × × = Pr A ( abab )= 2 2 3 3 4 24 Nantes, November 2013 9

b 0.1 0.9 a a 0.35 0.7 a 0.7 b 0.65 b 0.3 0.3 Nantes, November 2013 10

1 b 2 a 1 1 1 b 2 2 3 1 a 4 a 2 b 1 3 3 2 4 PFA : Probabilistic Finite (state) Automaton Nantes, November 2013 11

1 b 2 ε ε 1 1 1 2 2 3 1 a 4 ε 2 b 1 3 3 2 4 ε - PFA : Probabilistic Finite (state) Automaton with ε -transitions Nantes, November 2013 12

How useful are these automata? � They can define a distribution over Σ * � They do not tell us if a string belongs to a language � They are good candidates for grammar induction � There is (was?) not that much written theory Nantes, November 2013 13

Basic references � The HMM literature � Azaria Paz 1973: Introduction to probabilistic automata � Chapter 5 of my book � Probabilistic Finite-State Machines , Vidal, Thollard, cdlh, Casacuberta & Carrasco � Grammatical Inference papers Nantes, November 2013 14

Automata, definitions Let D be a distribution over Σ * 0 ≤ Pr D ( w ) ≤ 1 ∑ w ∈Σ * Pr D ( w )=1 Nantes, November 2013 15

A Probabilistic Finite (state) Automaton is a < Q , Σ , I P , F P , δ P > � Q set of states � I P : Q → [0;1] � F P : Q → [0;1] � δ P : Q × Σ × Q → [0;1] Nantes, November 2013 16

What does a PFA do? � It defines the probability of each string w as the sum (over all paths reading w ) of the products of the probabilities � Pr A ( w )= ∑ π i ∈ paths( w ) Pr( π i ) � π i = q i0 a i1 q i1 a i2 … a in q in ∏ a ij δ P ( q ij-1 , a ij , q ij ) � Pr( π i )=I P ( q i0 ) · F P ( q in ) · � Note that if λ -transitions are allowed the sum may be infinite Nantes, November 2013 17

b 0.4 0.1 a 0.35 a b 0.4 0.7 0.2 a 0.1 a b 0.45 0.3 1 Pr( aba ) = 0.7*0.4*0.1*1 +0.7*0.4*0.45*0.2 = 0.028+0.0252=0.0532 Nantes, November 2013 18

� non deterministic PFA : many initial states/only one initial state � a λ - PFA : a PFA with λ -transitions and perhaps many initial states � DPFA : a deterministic PFA Nantes, November 2013 19

Consistency A PFA is consistent if � Pr A ( Σ * )=1 � ∀ x ∈Σ * 0 ≤ Pr A ( x ) ≤ 1 Nantes, November 2013 20

Consistency theorem A is consistent if every state is useful (accessible and co-accessible) and ∀ q ∈ Q F P ( q ) + ∑ q ’ ∈ Q , a ∈Σ δ P ( q , a,q ’ )= 1 Nantes, November 2013 21

Equivalence between models � Equivalence between PFA and HMM … � But the HMM usually define distributions over each Σ n Nantes, November 2013 22

A football HMM win draw lose win draw lose win draw lose 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 2 2 4 4 4 4 2 1 1 4 4 3 1 3 4 2 4 Nantes, November 2013 23

Equivalence between PFA with λ -transitions and PFA without λ -transitions cdlh 2003, Hanneforth & cdlh 2009 � Many initial states can be transformed into one initial state with λ -transitions; � λ -transitions can be removed in polynomial time; � Strategy: � number the states � eliminate first λ -loops, then the transitions with highest ranking arrival state Nantes, November 2013 24

PFA are strictly more powerful than DPFA Folk theorem (and) You can ’ t even tell in advance if you are in a good case or not (see: Denis & Esposito 2004) Nantes, November 2013 25

Example 1 a 3 2 3 a 1 This distribution 2 cannot be modelled by a DPFA a 1 1 2 2 1 a 2 Nantes, November 2013 26

What does a DPFA over a Σ ={ a } look like? a a … a And with this architecture you cannot generate the previous one Nantes, November 2013 27

Parsing issues � Computation of the probability of a string or of a set of strings � Deterministic case � Simple: apply definitions � Technically, rather sum up logs: this is easier, safer and cheaper Nantes, November 2013 28

b 0.9 0.1 a a 0.35 0.7 a 0.7 b 0.65 b 0.3 0.3 Pr( aba ) = 0.7*0.9*0.35*0 = 0 Pr( abb ) = 0.7*0.9*0.65*0.3 = 0.12285 Nantes, November 2013 29

Non-deterministic case b 0.4 0.1 a 0.35 a b 0.4 0.7 0.2 a 0.1 a b 0.45 0.3 1 Pr( aba ) = 0.7*0.4*0.1*1 +0.7*0.4*0.45*0.2 = 0.028+0.0252=0.0532 Nantes, November 2013 30

In the literature � The computation of the probability of a string is by dynamic programming : O( n 2 m ) � 2 algorithms: Backward and Forward � If we want the most probable derivation to define the probability of a string, then we can use the Viterbi algorithm Nantes, November 2013 31

Forward algorithm � A [ i , j ]=Pr( q i | a 1 .. a j ) (The probability of being in state q i after having read a 1 .. a j ) � A [i,0]=I P ( q i ) � A [ i , j +1]= ∑ k ≤ | Q | A [ k , j ] . δ P ( q k , a j +1 , q i ) � Pr( a 1 .. a n )= ∑ k ≤ | Q | A [ k , n ] . F P ( q k ) Nantes, November 2013 32

2 Distances What for? � Estimate the quality of a language model � Have an indicator of the convergence of learning algorithms � Construct kernels Nantes, November 2013 33

2.1 Entropy � How many bits do we need to correct our model? � Two distributions over Σ * : D et D ’ � Kullback Leibler divergence (or relative entropy) between D and D ’ : ∑ w ∈Σ * Pr D ( w ) × log Pr D ( w )-log Pr D ’ ( w )  Nantes, November 2013 34

2.2 Perplexity � The idea is to allow the computation of the divergence, but relatively to a test set ( S ) � An approximation ( sic ) is perplexity: inverse of the geometric mean of the probabilities of the elements of the test set Nantes, November 2013 35

∏ w ∈ S Pr D ( w ) -1/  S  = 1 ∏ w ∈ S Pr D ( w )  S  Problem if some probability is null... Nantes, November 2013 36

Why multiply (1) � We are trying to compute the probability of independently drawing the different strings in set S Nantes, November 2013 37

Why multiply? (2) � Suppose we have two predictors for a coin toss � Predictor 1: heads 60%, tails 40% � Predictor 2: heads 100% � The tests are H: 6, T: 4 � Arithmetic mean � P1: 36%+16%=0,52 � P2: 0,6 � Predictor 2 would be the better predictor ;-) Nantes, November 2013 38

2.3 Distance d 2 ∑ w ∈Σ * (Pr D ( w )-Pr D ’ ( w )) 2 d 2 ( D , D ’ )= Can be computed in polynomial time if D and D ’ are given by PFA (Carrasco & cdlh 2002) This also means that equivalence of PFA is in P Nantes, November 2013 39

3 FFA Frequency Finite (state) Automata Nantes, November 2013 40

A learning sample � is a multiset � Strings appear with a frequency (or multiplicity) � S ={ λ (3), aaa (4), aaba (2), ababa (1), bb (3), bbaaa (1)} Nantes, November 2013 41

DFFA A deterministic frequency finite automaton is a DFA with a frequency function returning a positive integer for every state and every transition, and for entering the initial state such that � the sum of what enters is equal to what exits and � the sum of what halts is equal to what starts Nantes, November 2013 42

Example a : 2 a : 1 6 3 2 1 b : 3 b : 5 a : 5 b : 4 Nantes, November 2013 43

From a DFFA to a DPFA Frequencies become relative frequencies by dividing by sum of exiting frequencies a : 2/6 a : 1/7 6/6 3/13 2/7 1/6 b : 5/13 b : 3/6 a : 5/13 b : 4/7 Nantes, November 2013 44

From a DFA and a sample to a DFFA S = { λ , aaaa , ab, babb, bbbb, bbbbaa } a : 2 a : 1 6 3 2 1 b : 3 b : 5 a : 5 b : 4 Nantes, November 2013 45

Note � Another sample may lead to the same DFFA � Doing the same with a NFA is a much harder problem � Typically what algorithm Baum-Welch (EM) has been invented for … Nantes, November 2013 46

Learning probabilistic finite automata Colin de la Higuera - PowerPoint PPT Presentation

Learning probabilistic finite automata Colin de la Higuera University of Nantes Nantes, November 2013 1 Acknowledgements Laurent Miclet, Jose Oncina, Tim Oates, Rafael Carrasco, Paco Casacuberta, Rmi Eyraud, Philippe Ezequel, Henning

Introduction to Finite Automata Languages Deterministic Finite Automata Representations of

3.9: Empty-string Finite Automata In this and the following two sections, we will study three

Finite Automata: Informal Finite Automata: Informal p.1/20 Computational models The

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

3.10: Nondeterministic Finite Automata In this section, we study the second of our more restricted

3.7: Simplification of Finite Automata In this section, we: say what it means for a finite

Expressive Completeness over Nat and Finite orders MLO=Automata=regular expressions (over finite

Computation Finite State Automata (12.2) Definition 1 A Finite State Automata (FSA) is a 5-tuple (

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Finite state automata Finite graphs with labels on edges/nodes Lecture 2 a set of nodes

Synchronizing Finite Automata Lecture IV. Synchronizing Automata and Markov Chains Mikhail Volkov

Finite Automata A finite automaton has a finite set of states with which it accepts or rejects

1 Deterministic Finite Automata S* 0,1 Finite Automaton Finite Internal States 0,1 0,1

Languages Recall. Non deterministic finite automata What is a language? with

Synchronizing Finite Automata Lecture I: Cern y conjecture, Pin-Frankls bound and

NISQ Near term Impact on Silicon of Quantum Research in the next 3 to 5 years And what it means

The Y (4260) and Y (4360) enhancements within coupled-channels Susana Coito Collaborator:

Solar Spectral Solar Spectral Irradiance Variability Irradiance Variability By: Thomas

Thermal Properties of Dense Matter The Homogeneous Phase C. Constantinou IKP, FZ J ulich 17

Strong SO(10)-inspired leptogenesis predictions and justification Reference papers:

A Search-Based Approach for Bayesian Inference of the T -cell Signaling Network Bradley Broom

TO JOIN BY TELEPHONE: TO JOIN BY TELEPHONE: Phone: (5 Phone: (510) 2 ) 210-8882 0-8882 | Access

CBVP2103 MDI (Multiple Document Interface) forms are forms that are created to hold other

Learning probabilistic finite automata Colin de la Higuera - PowerPoint PPT Presentation

Learning probabilistic finite automata Colin de la Higuera University of Nantes Nantes, November 2013 1 Acknowledgements Laurent Miclet, Jose Oncina, Tim Oates, Rafael Carrasco, Paco Casacuberta, Rmi Eyraud, Philippe Ezequel, Henning

Introduction to Finite Automata Languages Deterministic Finite Automata Representations of

3.9: Empty-string Finite Automata In this and the following two sections, we will study three

Finite Automata: Informal Finite Automata: Informal p.1/20 Computational models The

CSC 473 Automata, Grammars &amp; Languages 9/29/10 Automata, Grammars and Languages Discourse 03

3.10: Nondeterministic Finite Automata In this section, we study the second of our more restricted

3.7: Simplification of Finite Automata In this section, we: say what it means for a finite

Expressive Completeness over Nat and Finite orders MLO=Automata=regular expressions (over finite

Computation Finite State Automata (12.2) Definition 1 A Finite State Automata (FSA) is a 5-tuple (

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Finite state automata Finite graphs with labels on edges/nodes Lecture 2 a set of nodes

Synchronizing Finite Automata Lecture IV. Synchronizing Automata and Markov Chains Mikhail Volkov

Finite Automata A finite automaton has a finite set of states with which it accepts or rejects

1 Deterministic Finite Automata S* 0,1 Finite Automaton Finite Internal States 0,1 0,1

Languages Recall. Non deterministic finite automata What is a language? with

Synchronizing Finite Automata Lecture I: Cern y conjecture, Pin-Frankls bound and

NISQ Near term Impact on Silicon of Quantum Research in the next 3 to 5 years And what it means

The Y (4260) and Y (4360) enhancements within coupled-channels Susana Coito Collaborator:

Solar Spectral Solar Spectral Irradiance Variability Irradiance Variability By: Thomas

Thermal Properties of Dense Matter The Homogeneous Phase C. Constantinou IKP, FZ J ulich 17

Strong SO(10)-inspired leptogenesis predictions and justification Reference papers:

A Search-Based Approach for Bayesian Inference of the T -cell Signaling Network Bradley Broom

TO JOIN BY TELEPHONE: TO JOIN BY TELEPHONE: Phone: (5 Phone: (510) 2 ) 210-8882 0-8882 | Access

CBVP2103 MDI (Multiple Document Interface) forms are forms that are created to hold other

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03