Learning Automata with Hankel Matrices Borja Balle [ Disclaimer : - PowerPoint PPT Presentation

Learning Automata with Hankel Matrices Borja Balle [ Disclaimer : Work done before joining Amazon]

Brief History of Automata Learning • [1967] Gold: Regular languages are learnable in the limit • [1987] Angluin: Regular languages are learnable from queries • [1993] Pitt & Warmuth: PAC-learning DFA is NP-hard • [1994] Kearns & Valiant: Cryptographic hardness • [90’s, 00’s] Clark, Denis, de la Higuera, Oncina, others: Combinatorial methods meet statistics and linear algebra • [2009] Hsu-Kakade-Zhang & Bailly-Denis-Ralaivola: Spectral learning

Talk Outline • Exact Learning – Hankel Trick for Deterministic Automata – Angluin’s L* Algorithm • PAC Learning – Hankel Trick for Weighted Automata – Spectral Learning Algorithm • Statistical Learning – Hankel Matrix Completion

The Hankel Matrix a b aa ab ba bb s ¨¨¨ ¨¨¨ ✏ » fi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ . ✏ . H P R Σ ‹ ˆ Σ ‹ — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ — ffi a . — . ffi — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‹ p ¨ s “ p 1 ¨ s 1 ñ H p p , s q “ H p p 1 , s 1 q — ffi b . . — ffi — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ — ffi aa . . — ffi — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ — ab ffi . . — ffi — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ ba — . ffi . — ffi s q f : Σ ‹ Ñ R — ffi . ‚ ‚ ‚ ‚ ‚ ‚ ‚ bb . — ffi . — ffi — ffi . . — ffi . — ffi s 1 H f p p , s q “ f p p ¨ s q ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ H p p , s q p — ffi – fl . . .

Hankel Matrices and DFA a b a b aa ab ba bb ✏ ¨¨¨ a » fi 0 1 0 1 0 1 0 ✏ q 0 q 1 1 1 0 1 0 0 1 a — ffi — ffi 0 1 0 1 0 1 0 b b — ffi — ffi 1 1 0 1 0 0 1 aa — ffi b — ffi a 0 0 1 1 0 1 0 ab — ffi — ffi q 2 1 1 0 1 0 0 1 ba — ffi — ffi 0 1 0 1 0 1 0 bb — ffi – fl . . Theorem (Myhill-Nerode ‘58) . The number of distinct rows of a binary Hankel matrix H equals the minimal number of states of a DFA recognizing the language of H

From Hankel Matrices to DFA a b aa ab ba bb ✏ ¨¨¨ » fi 0 1 0 1 0 1 0 ✏ a b a 1 1 0 1 0 0 1 a a — ffi — ffi 0 1 0 1 0 1 0 b — a ffi a ✏ ✏ a — ffi 1 1 0 1 0 0 1 ✏ aa — ffi — ffi 0 0 1 1 0 1 0 ab — ffi b — ffi 1 1 0 1 0 0 1 ba — ffi — ffi 0 1 0 1 0 1 0 bb — ffi b — ffi . a . — ffi . — ffi ab ab — ffi ab 0 1 0 1 0 1 0 aba — ffi — ffi 1 1 0 1 0 0 1 abb — ffi – fl . . .

Closed and Consistent Finite Hankel Matrices S a ✏ » fi The DFA synthesis algorithm requires: 0 1 ✏ • Sets of prefixes P and suffixes S 1 1 a — ffi — ffi 0 1 • Hankel block over P’ = P ∪ PΣ and S b — ffi P’ P — ffi 1 1 • Closed : rows(PΣ) ⊆ rows(P) aa — ffi — ffi 0 0 • Consistent : row(p) = row(p’) ⇒ row(p·a) = row(p’·a) ab — ffi — ffi 0 1 aba – fl 1 1 abb

Learning from Membership and Equivalence Queries • Setup: – Two players, Teacher and Learner – Concept class C of function from X to Y (known to Teacher and Learner) • Protocol: – Teacher secretly chooses concept c from C – Learner’s goal is to discover the secret concept c – Teacher answers two types of queries asked by Learner • Membership queries : what is the value of c(x) for some x picked by the Learner? • Equivalence queries : is c equal to hypothesis h from C picked by the Learner? – If not, return counter-example x where h(x) and c(x) differ Angluin, D. (1988). Queries and concept learning.

Angluin's L* Algorithm 1) Initialize P = {ε} and S = {ε} 2) Maintain the Hankel block H for P’ = P ∪ PΣ and S using membership queries 3) Repeat: § While H is not closed and consistent: • If H is not consistent add a distinguishing suffix to S • If H is not closed add a new prefix from PΣ to P § Construct a DFA A from H and ask an equivalence query • If yes , terminate • Otherwise, add all prefixes of counter-example x to P O(n) EQs and O(| 𝝩 | n 2 L) MQs Complexity Angluin, D. (1987). Learning regular sets from queries and counterexamples.

Weighted Finite Automata (WFA) Algebraic Representation Graphical Representation A “ x α , β , t A a u a P Σ y a , 1 . 2 a , 3 . 2 b , 2 b , 5 a , ´ 2 „ ´ 1 „ 1 . 2 ⇢ ⇢ b , 0 ´ 1 α “ A a “ ´ 2 0 . 5 3 . 2 q 1 q 2 ´ 1 0 . 5 1 . 2 0 „ 1 . 2 „ 2 ⇢ ⇢ ´ 2 β “ A b “ a , ´ 1 0 0 5 b , ´ 2 Functional A p x 1 ¨ ¨ ¨ x t q “ α J A x 1 ¨ ¨ ¨ A x t β Representation

Hankel Matrices and WFA Theorem (Fliess ‘74) The rank of a real Hankel matrix H equals the minimal number of states of a WFA recognizing the weighted language of H A p p 1 ¨ ¨ ¨ p t s 1 ¨ ¨ ¨ s t 1 q “ α J A p 1 ¨ ¨ ¨ A p t A s 1 ¨ ¨ ¨ A s t 1 β s ¨ ¨ ¨ ¨ » fi » fi » fi ¨ ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ — — ffi ffi ¨ fl “ ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ — ffi — ffi – fl — — ffi ffi ¨ ¨ A p ps q ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ – – fl p ¨ ¨ ¨ ¨

From Hankel Matrices to WFA H a p p , s q “ A p pas q A p p 1 ¨ ¨ ¨ p t as 1 ¨ ¨ ¨ s t 1 q “ α J A p 1 ¨ ¨ ¨ A p t A a A s 1 ¨ ¨ ¨ A s t 1 β s ¨ ¨ ¨ ¨ » fi » fi » fi » fi ¨ ¨ ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ — — ffi ffi ¨ ¨ ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ fl “ — — ffi ffi – fl – fl — — ffi ffi ¨ ¨ A p pas q ¨ ¨ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ p – – fl ¨ ¨ ¨ ¨ A a “ P ` H a S ` H “ P S H a “ P A a S

WFA Reconstruction via Singular Value Decomposition Input: Hankel H’ over P’ = P ∪ PΣ and S, number of states n 1) Extract from H’ the matrix H over P and S 2) Compute the rank n SVD H = U D V T 3) For each symbol a: § Extract from H’ the matrix H a over P and S § Compute A a = D -1 U T H a V } H 1 ´ ˆ } A a ´ ˆ H 1 } § ε Robustness Property ñ A a } § O p ε q Balle, B., Carreras, X., Luque, F. M., & Quattoni, A. (2014). Spectral learning of weighted automata.

Probably Approximately Correct (PAC) Learning • Fix a class D of distributions over X • Collect m i.i.d. samples Z = (x 1 , ..., x m ) from some unknown distribution d from D • An algorithm that receives Z and outputs a hypothesis h is a PAC-learner for the class D if: – Whenever m > poly(|d|, 1/ε, log 1/δ), with probability at least 1 – δ the hypothesis satisfies distance(d,h) < ε • The algorithm is an efficient PAC-learner if it runs in poly-time Kearns, M., Mansour, Y., Ron, D., Rubinfeld, R., Schapire, R. E., & Sellie, L. (1994). On the learnability of discrete distributions. Valiant, L. G. (1984). A theory of the learnable.

Estimating Hankel Matrices from Samples Sample Empirical Hankel Matrix $ , a b aa ab ✏ aa , b , bab , a , ¨¨¨ ’ / » fi 0 3 1 1 1 ’ / & bbab , abb , babba , abbb , . ✏ 16 16 16 16 16 3 1 1 0 0 a ab , a , aabba , baa , — ffi 16 16 16 16 16 ’ / — ffi ’ / 1 0 1 1 1 % - abbab , baba , bb , a b — ffi 16 16 16 16 16 — ffi 1 0 0 0 0 aa — ffi 16 16 16 16 16 — ffi 1 0 1 0 0 ab — ffi Concentration Bound 16 16 16 16 16 – fl . . . ˆ 1 ˙ } H ´ ˆ ? m H } § O Denis, F., Gybels, M., & Habrard, A. (2014, January). Dimension-free concentration bounds on hankel matrices for spectral learning.

Spectral PAC Learning of Stochastic WFA • Algorithm: 1. Estimate empirical Hankel matrix 2. Use spectral WFA reconstruction Efficient PAC-learning: • Running time: linear in m, polynomial in n and size of Hankel matrix – Accuracy measure: L 1 distance on all strings of length at most L – Sample complexity: L 2 |Σ| n 1/2 / σ 2 ε 2 – Proof: robustness + concentration + telescopic L 1 bound – Bailly, R., Denis, F., & Ralaivola, L. (2009). Grammatical inference as a principal component analysis problem. Hsu, D., Kakade, S. M., & Zhang, T. (2009). A spectral algorithm for learning hidden markov models.

Statistical Learning in the Non-realizable Setting • Fix an unknown distribution d over X x Y (inputs, outputs) • Collect m i.i.d. samples Z = ((x 1 ,y 1 ),...,(x m ,y m )) from d • Fix a hypothesis class F of functions from X to Y • Find a hypothesis h from F that has good accuracy on Z m 1 Empirical Risk ÿ ` p h p x i q , y i q min Minimization m h P F i “ 1 • In such a way that it has good accuracy on future (x,y) from d m E p x , y q„ d r ` p h p x q , y qs § 1 ÿ ` p h p x i q , y i q ` complexity p Z , F q m i “ 1

Learning WFA via Hankel Matrix Completion a 2 3 a a b ✏ » fi 1 2 1 b b a 1 1 1 ? ? 0 b $ , a (bab,1) (bbb,0) — ffi — ffi ’ / 2 3 ? ’ / aa — (aaa,3) (a,1) ffi & . b — ffi ? ? 1 1 2 ? ab — ffi (ab,1) (aa,2) b a — ’ / ffi ? ? 1 ’ / ba % - – fl (aba,2) (bb,0) 0 ? 0 bb ? b b 0 0 Balle, B., & Mohri, M. (2012). Spectral learning of general weighted automata via constrained matrix completion.

Learning Automata with Hankel Matrices Borja Balle [ Disclaimer : - PowerPoint PPT Presentation

Learning Automata with Hankel Matrices Borja Balle [ Disclaimer : Work done before joining Amazon] Brief History of Automata Learning [1967] Gold: Regular languages are learnable in the limit [1987] Angluin: Regular languages are learnable

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Hankel Matrices: From Words to Graphs Nadia Labai and Johann A. Makowsky Faculty of Computer

Learning Automata with Hankel Matrices Borja Balle Amazon Research Cambridge Highlights

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Hecke algebras on homogeneous trees and relation with Hankel and Toeplitz matrices Janusz

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Experimental Approach to the Hankel Transform of Catalan Number Combinations Wenyang Qian

Hankel determinants, continued fractions, orthgonal polynomials, and hypergeometric series Ira

Coefficientwise total positivity (via continued fractions) for some Hankel matrices of

Real root finding for rank defects in linear Hankel matrices Didier Henrion 1 , 2 Simone Naldi 1

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Transformations and Matrices Transformations I Transformations are functions Matrices

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by

A Congruence-based Perspective on Automata Minimization Algorithms Pierre Ganty, Elena

Proving Non-regularity Question: Is every language a regular language? No. Each DFA M can be

A robust extension of -regular word languages. Mikoaj Bojaczyk Warsaw University What is

TDDD14/TDDD85 Slides for Lecture 6 Myhill-Nerode Relations Christer Bckstrm, 2017

Two-Way Automata in Coq Christian Doczkal and Gert Smolka Interactive Theorem Proving, Nancy,

Active automata learning Based on: Bernhard Steffen, Falk Howar und Maik Merten: Introduction to

Tree Automata as Algebras: Minimisation and Determinisation Tobias Kapp e Jurriaan Rot Gerco

Pushing for weighted tree automata Thomas Hanneforth Andreas Maletti Daniel Quernheim Institute

Sambuz

Useful Links

Newsletter

Mail Us

Learning Automata with Hankel Matrices Borja Balle [ Disclaimer : - PowerPoint PPT Presentation

Learning Automata with Hankel Matrices Borja Balle [ Disclaimer : Work done before joining Amazon] Brief History of Automata Learning [1967] Gold: Regular languages are learnable in the limit [1987] Angluin: Regular languages are learnable

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Hankel Matrices: From Words to Graphs Nadia Labai and Johann A. Makowsky Faculty of Computer

Learning Automata with Hankel Matrices Borja Balle Amazon Research Cambridge Highlights

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Hecke algebras on homogeneous trees and relation with Hankel and Toeplitz matrices Janusz

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Experimental Approach to the Hankel Transform of Catalan Number Combinations Wenyang Qian

Hankel determinants, continued fractions, orthgonal polynomials, and hypergeometric series Ira

Coefficientwise total positivity (via continued fractions) for some Hankel matrices of

Real root finding for rank defects in linear Hankel matrices Didier Henrion 1 , 2 Simone Naldi 1

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

CSC 473 Automata, Grammars &amp; Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Transformations and Matrices Transformations I Transformations are functions Matrices

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal &amp; spectral matrices) by

A Congruence-based Perspective on Automata Minimization Algorithms Pierre Ganty, Elena

Proving Non-regularity Question: Is every language a regular language? No. Each DFA M can be

A robust extension of -regular word languages. Mikoaj Bojaczyk Warsaw University What is

TDDD14/TDDD85 Slides for Lecture 6 Myhill-Nerode Relations Christer Bckstrm, 2017

Two-Way Automata in Coq Christian Doczkal and Gert Smolka Interactive Theorem Proving, Nancy,

Active automata learning Based on: Bernhard Steffen, Falk Howar und Maik Merten: Introduction to

Tree Automata as Algebras: Minimisation and Determinisation Tobias Kapp e Jurriaan Rot Gerco

Pushing for weighted tree automata Thomas Hanneforth Andreas Maletti Daniel Quernheim Institute

Sambuz

Useful Links

Newsletter

Mail Us

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by