learning automata with hankel matrices
play

Learning Automata with Hankel Matrices Borja Balle [ Disclaimer : - PowerPoint PPT Presentation

Learning Automata with Hankel Matrices Borja Balle [ Disclaimer : Work done before joining Amazon] Brief History of Automata Learning [1967] Gold: Regular languages are learnable in the limit [1987] Angluin: Regular languages are learnable


  1. Learning Automata with Hankel Matrices Borja Balle [ Disclaimer : Work done before joining Amazon]

  2. Brief History of Automata Learning • [1967] Gold: Regular languages are learnable in the limit • [1987] Angluin: Regular languages are learnable from queries • [1993] Pitt & Warmuth: PAC-learning DFA is NP-hard • [1994] Kearns & Valiant: Cryptographic hardness • [90’s, 00’s] Clark, Denis, de la Higuera, Oncina, others: Combinatorial methods meet statistics and linear algebra • [2009] Hsu-Kakade-Zhang & Bailly-Denis-Ralaivola: Spectral learning

  3. Talk Outline • Exact Learning – Hankel Trick for Deterministic Automata – Angluin’s L* Algorithm • PAC Learning – Hankel Trick for Weighted Automata – Spectral Learning Algorithm • Statistical Learning – Hankel Matrix Completion

  4. The Hankel Matrix a b aa ab ba bb s ¨¨¨ ¨¨¨ ✏ » fi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ . ✏ . H P R Σ ‹ ˆ Σ ‹ — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ — ffi a . — . ffi — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‹ p ¨ s “ p 1 ¨ s 1 ñ H p p , s q “ H p p 1 , s 1 q — ffi b . . — ffi — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ — ffi aa . . — ffi — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ — ab ffi . . — ffi — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ ba — . ffi . — ffi s q f : Σ ‹ Ñ R — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ bb . — ffi . — ffi — ffi . . — ffi . — ffi s 1 H f p p , s q “ f p p ¨ s q ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ H p p , s q p — ffi – fl . . .

  5. Hankel Matrices and DFA a b a b aa ab ba bb ✏ ¨¨¨ a » fi 0 1 0 1 0 1 0 ✏ q 0 q 1 1 1 0 1 0 0 1 a — ffi — ffi 0 1 0 1 0 1 0 b b — ffi — ffi 1 1 0 1 0 0 1 aa — ffi b — ffi a 0 0 1 1 0 1 0 ab — ffi — ffi q 2 1 1 0 1 0 0 1 ba — ffi — ffi 0 1 0 1 0 1 0 bb — ffi – fl . . Theorem (Myhill-Nerode ‘58) . The number of distinct rows of a binary Hankel matrix H equals the minimal number of states of a DFA recognizing the language of H

  6. From Hankel Matrices to DFA a b aa ab ba bb ✏ ¨¨¨ » fi 0 1 0 1 0 1 0 ✏ a b a 1 1 0 1 0 0 1 a a — ffi — ffi 0 1 0 1 0 1 0 b — a ffi a ✏ ✏ a — ffi 1 1 0 1 0 0 1 ✏ aa — ffi — ffi 0 0 1 1 0 1 0 ab — ffi b — ffi 1 1 0 1 0 0 1 ba — ffi — ffi 0 1 0 1 0 1 0 bb — ffi b — ffi . a . — ffi . — ffi ab ab — ffi ab 0 1 0 1 0 1 0 aba — ffi — ffi 1 1 0 1 0 0 1 abb — ffi – fl . . .

  7. Closed and Consistent Finite Hankel Matrices S a ✏ » fi The DFA synthesis algorithm requires: 0 1 ✏ • Sets of prefixes P and suffixes S 1 1 a — ffi — ffi 0 1 • Hankel block over P’ = P ∪ PΣ and S b — ffi P’ P — ffi 1 1 • Closed : rows(PΣ) ⊆ rows(P) aa — ffi — ffi 0 0 • Consistent : row(p) = row(p’) ⇒ row(p·a) = row(p’·a) ab — ffi — ffi 0 1 aba – fl 1 1 abb

  8. Learning from Membership and Equivalence Queries • Setup: – Two players, Teacher and Learner – Concept class C of function from X to Y (known to Teacher and Learner) • Protocol: – Teacher secretly chooses concept c from C – Learner’s goal is to discover the secret concept c – Teacher answers two types of queries asked by Learner • Membership queries : what is the value of c(x) for some x picked by the Learner? • Equivalence queries : is c equal to hypothesis h from C picked by the Learner? – If not, return counter-example x where h(x) and c(x) differ Angluin, D. (1988). Queries and concept learning.

  9. Angluin's L* Algorithm 1) Initialize P = {ε} and S = {ε} 2) Maintain the Hankel block H for P’ = P ∪ PΣ and S using membership queries 3) Repeat: § While H is not closed and consistent: • If H is not consistent add a distinguishing suffix to S • If H is not closed add a new prefix from PΣ to P § Construct a DFA A from H and ask an equivalence query • If yes , terminate • Otherwise, add all prefixes of counter-example x to P O(n) EQs and O(| 𝝩 | n 2 L) MQs Complexity Angluin, D. (1987). Learning regular sets from queries and counterexamples.

  10. Weighted Finite Automata (WFA) Algebraic Representation Graphical Representation A “ x α , β , t A a u a P Σ y a , 1 . 2 a , 3 . 2 b , 2 b , 5 a , ´ 2 „ ´ 1 „ 1 . 2 ⇢ ⇢ b , 0 ´ 1 α “ A a “ ´ 2 0 . 5 3 . 2 q 1 q 2 ´ 1 0 . 5 1 . 2 0 „ 1 . 2 „ 2 ⇢ ⇢ ´ 2 β “ A b “ a , ´ 1 0 0 5 b , ´ 2 Functional A p x 1 ¨ ¨ ¨ x t q “ α J A x 1 ¨ ¨ ¨ A x t β Representation

  11. Hankel Matrices and WFA Theorem (Fliess ‘74) The rank of a real Hankel matrix H equals the minimal number of states of a WFA recognizing the weighted language of H A p p 1 ¨ ¨ ¨ p t s 1 ¨ ¨ ¨ s t 1 q “ α J A p 1 ¨ ¨ ¨ A p t A s 1 ¨ ¨ ¨ A s t 1 β s ¨ ¨ ¨ ¨ » fi » fi » fi ¨ ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ — — ffi ffi ¨ fl “ ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ — ffi — ffi – fl — — ffi ffi ¨ ¨ A p ps q ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ – – fl p ¨ ¨ ¨ ¨

  12. From Hankel Matrices to WFA H a p p , s q “ A p pas q A p p 1 ¨ ¨ ¨ p t as 1 ¨ ¨ ¨ s t 1 q “ α J A p 1 ¨ ¨ ¨ A p t A a A s 1 ¨ ¨ ¨ A s t 1 β s ¨ ¨ ¨ ¨ » fi » fi » fi » fi ¨ ¨ ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ — — ffi ffi ¨ ¨ ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ fl “ — — ffi ffi – fl – fl — — ffi ffi ¨ ¨ A p pas q ¨ ¨ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ p – – fl ¨ ¨ ¨ ¨ A a “ P ` H a S ` H “ P S H a “ P A a S

  13. WFA Reconstruction via Singular Value Decomposition Input: Hankel H’ over P’ = P ∪ PΣ and S, number of states n 1) Extract from H’ the matrix H over P and S 2) Compute the rank n SVD H = U D V T 3) For each symbol a: § Extract from H’ the matrix H a over P and S § Compute A a = D -1 U T H a V } H 1 ´ ˆ } A a ´ ˆ H 1 } § ε Robustness Property ñ A a } § O p ε q Balle, B., Carreras, X., Luque, F. M., & Quattoni, A. (2014). Spectral learning of weighted automata.

  14. Probably Approximately Correct (PAC) Learning • Fix a class D of distributions over X • Collect m i.i.d. samples Z = (x 1 , ..., x m ) from some unknown distribution d from D • An algorithm that receives Z and outputs a hypothesis h is a PAC-learner for the class D if: – Whenever m > poly(|d|, 1/ε, log 1/δ), with probability at least 1 – δ the hypothesis satisfies distance(d,h) < ε • The algorithm is an efficient PAC-learner if it runs in poly-time Kearns, M., Mansour, Y., Ron, D., Rubinfeld, R., Schapire, R. E., & Sellie, L. (1994). On the learnability of discrete distributions. Valiant, L. G. (1984). A theory of the learnable.

  15. Estimating Hankel Matrices from Samples Sample Empirical Hankel Matrix $ , a b aa ab ✏ aa , b , bab , a , ¨¨¨ ’ / » fi 0 3 1 1 1 ’ / & bbab , abb , babba , abbb , . ✏ 16 16 16 16 16 3 1 1 0 0 a ab , a , aabba , baa , — ffi 16 16 16 16 16 ’ / — ffi ’ / 1 0 1 1 1 % - abbab , baba , bb , a b — ffi 16 16 16 16 16 — ffi 1 0 0 0 0 aa — ffi 16 16 16 16 16 — ffi 1 0 1 0 0 ab — ffi Concentration Bound 16 16 16 16 16 – fl . . . ˆ 1 ˙ } H ´ ˆ ? m H } § O Denis, F., Gybels, M., & Habrard, A. (2014, January). Dimension-free concentration bounds on hankel matrices for spectral learning.

  16. Spectral PAC Learning of Stochastic WFA • Algorithm: 1. Estimate empirical Hankel matrix 2. Use spectral WFA reconstruction Efficient PAC-learning: • Running time: linear in m, polynomial in n and size of Hankel matrix – Accuracy measure: L 1 distance on all strings of length at most L – Sample complexity: L 2 |Σ| n 1/2 / σ 2 ε 2 – Proof: robustness + concentration + telescopic L 1 bound – Bailly, R., Denis, F., & Ralaivola, L. (2009). Grammatical inference as a principal component analysis problem. Hsu, D., Kakade, S. M., & Zhang, T. (2009). A spectral algorithm for learning hidden markov models.

  17. Statistical Learning in the Non-realizable Setting • Fix an unknown distribution d over X x Y (inputs, outputs) • Collect m i.i.d. samples Z = ((x 1 ,y 1 ),...,(x m ,y m )) from d • Fix a hypothesis class F of functions from X to Y • Find a hypothesis h from F that has good accuracy on Z m 1 Empirical Risk ÿ ` p h p x i q , y i q min Minimization m h P F i “ 1 • In such a way that it has good accuracy on future (x,y) from d m E p x , y q„ d r ` p h p x q , y qs § 1 ÿ ` p h p x i q , y i q ` complexity p Z , F q m i “ 1

  18. Learning WFA via Hankel Matrix Completion a 2 3 a a b ✏ » fi 1 2 1 b b a 1 1 1 ? ? 0 b $ , a (bab,1) (bbb,0) — ffi — ffi ’ / 2 3 ? ’ / aa — (aaa,3) (a,1) ffi & . b — ffi ? ? 1 1 2 ? ab — ffi (ab,1) (aa,2) b a — ’ / ffi ? ? 1 ’ / ba % - – fl (aba,2) (bb,0) 0 ? 0 bb ? b b 0 0 Balle, B., & Mohri, M. (2012). Spectral learning of general weighted automata via constrained matrix completion.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend