Automata Learning Borja Balle Amazon Research Cambridge 1 - PowerPoint PPT Presentation

Automata Learning Borja Balle Amazon Research Cambridge 1 Foundations of Programming Summer School (Oxford) — July 2018 1 Based on work completed before joining Amazon

Brief History of Automata Learning 1967 Gold: Regular languages are learnable in the limit 1987 Angluin: Regular languages are learnable from queries 1993 Pitt & Warmuth: PAC-learning DFA is NP-hard 1994 Kearns & Valiant: Cryptographic hardness . . . Clark, Denis, de la Higuera, Oncina, others: Combinatorial methods meet statistics and linear algebra 2009 Hsu-Kakade-Zhang & Bailly-Denis-Ralaivola: Spectral learning

Goals of This Tutorial Goals § Motivate spectral learning techniques for weighted automata and related models on sequential and tree-structured data § Provide the key intuitions and fundamental results to effectively navigate the literature § Survey some formal learning results and give overview of some applications § Discuss role of linear algebra, concentration bounds, and learning theory in this area Non-Goals § Dive deep into applications: instead pointers will be provided § Provide an exhaustive treatment of automata learning: beyond the scope of an introductory lecture § Give complete proofs of the presented results: illuminating proofs will be discussed, technical proofs omitted

Outline 1. Sequential Data and Weighted Automata 2. WFA Reconstruction and Approximation 3. PAC Learning for Stochastic WFA 4. Statistical Learning for WFA 5. Beyond Sequences: Transductions and Trees 6. Conclusion

Learning Sequential Data § Sequential data arises in numerous applications of Machine Learning: § Natural language processing § Computational biology § Time series analysis § Sequential decision-making § Robotics § Learning from sequential data requires specialized algorithms § The most common ML algorithms assume the data can be represented as vectors of a fixed dimension § Sequences can have arbitrary length, and are compositional in nature § Similar things occur with trees, graphs, and other forms of structured data § Sequential data can be diverse in nature § Continuous vs. discrete time vs. only order information § Continuous vs. discrete observations

Functions on Strings § In this lecture we focus on sequences represented by strings on a finite alphabet: Σ ‹ § The goal will be to learn a function f : Σ ‹ Ñ R from data § The function being learned can represent many things, for example: § A language model: f p sentence q “ likelihood of observing a sentence in a specific natural language § A protein scoring model: f p aminoacid sequence q “ predicted activity of a protein in a biological reaction § A reward model: f p action sequence q “ expected reward an agent will obtain after executing a sequence of actions § A network model: f p packet sequence q “ probability that a sequence of packets will successfully transmit a message through a network § These functions can be identified with a weighted language f P R Σ ‹ , an infinite-dimensional object § In order to learn such functions we need a finite representation: weighted automata

Weighted Finite Automata Graphical Representation Algebraic Representation „ ´ 1 „ 1 . 2   a , 1 . 2 a , 3 . 2 α “ β “ 0 . 5 0 b , 2 b , 5 a , ´ 2 „ 1 . 2 b , 0  ´ 1 A a “ q 1 q 2 ´ 2 3 . 2 ´ 1 0 . 5 1 . 2 0 „ 2 ´ 2  a , ´ 1 A b “ 0 5 b , ´ 2 Weighted Finite Automaton A WFA A with n “ | A | states is a tuple A “ x α , β , t A σ u σ P Σ y where α , β P R n and A σ P R n ˆ n

Language of a WFA With every WFA A “ x α , β , t A σ uy with n states we associate a weighted language f A : Σ ‹ Ñ R given by ˜ T ¸ ÿ ź f A p x 1 ¨ ¨ ¨ x T q “ α p q 0 q A x t p q t ´ 1 , q t q β p q T q t “ 1 q 0 , q 1 ,..., q T Pr n s “ α J A x 1 ¨ ¨ ¨ A x T β “ α J A x β Recognizable/Rational Languages A weighted language f : Σ ‹ Ñ R is recognizable/rational if there exists a WFA A such that f “ f A . The smallest number of states of such a WFA is rank p f q . A WFA A is minimal if | A | “ rank p f A q . Observation: The minimal A is not unique. Take any invertible matrix Q P R n ˆ n , then α J A x 1 ¨ ¨ ¨ A x T β “ p α J Q qp Q ´ 1 A x 1 Q q ¨ ¨ ¨ p Q ´ 1 A x T Q qp Q ´ 1 β q

Examples: DFA, HMM Deterministic Finite Automata Hidden Markov Model § Weights in t 0 , 1 u § Weights in r 0 , 1 s § Initial: α indicator for initial state § Initial: α distribution over initial state § Final: β indicates accept/reject state § Final: β vector of ones σ § Transition: A σ p i , j q “ I r i § Transition: Ñ j s σ σ § f A : Σ ‹ Ñ t 0 , 1 u defines regular A σ p i , j q “ P r i Ñ j s “ P r i Ñ j s P r i Ñs § f A : Σ ‹ Ñ r 0 , 1 s defines dynamical language system

Hankel Matrices Given a weighted language f : Σ ‹ Ñ R define its Hankel matrix H f P R Σ ‹ ˆ Σ ‹ as ǫ a b ¨¨¨ s ¨¨¨ . » fi . f p ǫ q f p a q f p b q . ǫ . — . ffi f p a q f p aa q f p ab q . — ffi a — ffi . — . ffi f p b q f p ba q f p bb q . b — ffi H f “ — ffi . . — ffi . — ffi — ffi ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ f p p ¨ s q p — ffi – fl . . . Fliess–Kronecker Theorem [Fli74] The rank of H f is finite if and only if f is rational, in which case rank p H f q “ rank p f q

Intuition for the Fliess–Kronecker Theorem H f A P R Σ ‹ ˆ Σ ‹ P A P R Σ ‹ ˆ n S A P R n ˆ Σ ‹ s . » fi . . » ¨ ¨ ¨ fi s . — . ffi » fi . ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ — ffi — ffi — ffi . — ffi ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ “ — . ffi — ffi . – fl — ffi — ffi ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ — ffi p ¨ ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ ¨ ¨ ¨ ¨ – fl p — ffi ¨ ¨ ¨ – fl . . . α J A p 1 ¨ ¨ ¨ A p T f A p p 1 ¨ ¨ ¨ p T ¨ s 1 ¨ ¨ ¨ s T 1 q “ A s 1 ¨ ¨ ¨ A s T 1 β looooooomooooooon loooooomoooooon α A p p q β A p s q Note: We call H f “ P A S A the forward-backward factorization induced by A

Outline 1. Sequential Data and Weighted Automata 2. WFA Reconstruction and Approximation 3. PAC Learning for Stochastic WFA 4. Statistical Learning for WFA 5. Beyond Sequences: Transductions and Trees 6. Conclusion

From Hankel to WFA f p p 1 ¨ ¨ ¨ p T s 1 ¨ ¨ ¨ s T 1 q “ α J A p 1 ¨ ¨ ¨ A p T A s 1 ¨ ¨ ¨ A s T 1 β s » ¨ fi » ¨ ¨ ¨ fi » fi ¨ ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ — ffi — ffi — ffi — ffi H “ ¨ “ ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ — ffi — ffi – fl — ffi — ffi ¨ ¨ f p ps q ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ p – fl – fl ¨ ¨ ¨ ¨ f p p 1 ¨ ¨ ¨ p T σ s 1 ¨ ¨ ¨ s T 1 q “ α J A p 1 ¨ ¨ ¨ A p T A a A s 1 ¨ ¨ ¨ A s T 1 β s » ¨ fi » ¨ ¨ ¨ fi » fi » fi ¨ ¨ ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ — ffi — ffi — ffi — ffi H σ “ ¨ “ ¨ ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ — ffi — ffi – fl – fl — ffi — ffi ¨ ¨ f p pas q ¨ ¨ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ p – fl – fl ¨ ¨ ¨ ¨ Algebraically: Factorizing H lets us solve for A a A σ “ P ` H σ S ` H “ P S ù ñ H σ “ P A σ S ù ñ

Aside: Moore–Penrose Pseudo-inverse For any M P R n ˆ m there exists a unique pseudo-inverse M ` P R m ˆ n satisfying: § MM ` M “ M , M ` MM ` “ M ` , and M ` M and MM ` are symmetric § If rank p M q “ n then MM ` “ I , and if rank p M q “ m then M ` M “ I § If M is square and invertible then M ` “ M ´ 1 Given a system of linear equations Mu “ v , the following is satisfied: M ` v “ } u } 2 . argmin u P argmin } Mu ´ v } 2 In particular: § If the system is completely determined, M ` v solves the system § If the system is underdetermined, M ` v is the solution with smallest norm § If the system is overdetermined, M ` v is the minimum norm solution to the least-squares problem min } Mu ´ v } 2

Finite Hankel Sub-Blocks Given finite sets of prefixes and suffixes P , S Ă Σ ‹ and infinite Hankel matrix H f P R Σ ‹ ˆ Σ ‹ we define the sub-block H P R P ˆ S and for σ P Σ the sub-block H σ P R P σ ˆ S ǫ a b aa ab ba bb ¨¨¨ » ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ fi ǫ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ a — ffi — ffi ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ b — ffi — ffi ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ aa — ffi H f “ — ffi ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ ab — ffi — ffi ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ ba — ffi — ffi ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ bb — ffi – fl . . . . . . . . ... . . . . . . . . . . . . . . . .

Automata Learning Borja Balle Amazon Research Cambridge 1 - PowerPoint PPT Presentation

Automata Learning Borja Balle Amazon Research Cambridge 1 Foundations of Programming Summer School (Oxford) July 2018 1 Based on work completed before joining Amazon Brief History of Automata Learning 1967 Gold: Regular languages are

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Automata and program analysis Thomas Colcombet FCT Bordeaux 13 September 2017 based on

Pushdown Automata 7-0 Pushdown Automata The automata we saw so far were

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Seminar: Automata Theory Timed Automata Jennifer Nist 11 th February 2016 Chair of Software

Automata Theory Why Study Automata? What the Course is About 1 Why Study Automata? A survey of

Pushdown Automata Context Free Languages IV Input tape 1 2 Pushdown Automata 3 5 4 State

Pushdown Automata A pushdown automata (PDA) is essentially: Pushdown Automata An NFA with

Graph Automata Jan Leike July 2nd, 2012 Motivation We want an automata model that Motivation

Multiple tree automata a new model of tree automata Gwendal Collet (TU Wien), Julien David (LIPN)

Learning of Automata Models Learning of Automata Models Extended with Data B Bengt Jonsson t J

CALF: Categorical Automata Learning Framework Matteo Sammartino Alexandra Silva Gerco van

Fresh-Register Automata Fresh-Register Automata Nikos Tzevelekos Oxford University Computing

Space-efficient quantum Space-efficient quantum automata automata Andris Ambainis Nikolay

The State Automata Formalism Untimed models of discrete event systems Languages Regular

Taming the C Monster Haskell FFI Techniques Fraser Tweedale @hackuador May 22, 2018 FFI basics

Rust and its usage as Python extensions PyGamma 2019 Heidelberg Matthieu Baumann 03/19/19

How not to Design a Scripting Language Paul Biggar Department of Computer Science and Statistics

Rackona A pretty non-functional Racket->JVM FFI Chris K. Jester-Young @cky944 Days After

Practical Algebraic Effect Handlers in Multicore OCaml KC Sivaramakrishnan University of

Tips and Techniques to Grow FFI Membership 7-8 p.m. EDT U.S. Oct. 24, 2018 10 th in a series of 12

Compiling with Continuations and LLVM Kavon Farvardin John Reppy University of Chicago

Lean 4: State of the Sebastian Ullrich Ullrich - Lean 4: State of the A brief history of

Automata Learning Borja Balle Amazon Research Cambridge 1 - PowerPoint PPT Presentation

Automata Learning Borja Balle Amazon Research Cambridge 1 Foundations of Programming Summer School (Oxford) July 2018 1 Based on work completed before joining Amazon Brief History of Automata Learning 1967 Gold: Regular languages are

CSC 473 Automata, Grammars &amp; Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Automata and program analysis Thomas Colcombet FCT Bordeaux 13 September 2017 based on

Pushdown Automata 7-0 Pushdown Automata The automata we saw so far were

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Seminar: Automata Theory Timed Automata Jennifer Nist 11 th February 2016 Chair of Software

Automata Theory Why Study Automata? What the Course is About 1 Why Study Automata? A survey of

Pushdown Automata Context Free Languages IV Input tape 1 2 Pushdown Automata 3 5 4 State

Pushdown Automata A pushdown automata (PDA) is essentially: Pushdown Automata An NFA with

Graph Automata Jan Leike July 2nd, 2012 Motivation We want an automata model that Motivation

Multiple tree automata a new model of tree automata Gwendal Collet (TU Wien), Julien David (LIPN)

Learning of Automata Models Learning of Automata Models Extended with Data B Bengt Jonsson t J

CALF: Categorical Automata Learning Framework Matteo Sammartino Alexandra Silva Gerco van

Fresh-Register Automata Fresh-Register Automata Nikos Tzevelekos Oxford University Computing

Space-efficient quantum Space-efficient quantum automata automata Andris Ambainis Nikolay

The State Automata Formalism Untimed models of discrete event systems Languages Regular

Taming the C Monster Haskell FFI Techniques Fraser Tweedale @hackuador May 22, 2018 FFI basics

Rust and its usage as Python extensions PyGamma 2019 Heidelberg Matthieu Baumann 03/19/19

How not to Design a Scripting Language Paul Biggar Department of Computer Science and Statistics

Rackona A pretty non-functional Racket-&gt;JVM FFI Chris K. Jester-Young @cky944 Days After

Practical Algebraic Effect Handlers in Multicore OCaml KC Sivaramakrishnan University of

Tips and Techniques to Grow FFI Membership 7-8 p.m. EDT U.S. Oct. 24, 2018 10 th in a series of 12

Compiling with Continuations and LLVM Kavon Farvardin John Reppy University of Chicago

Lean 4: State of the Sebastian Ullrich Ullrich - Lean 4: State of the A brief history of

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Rackona A pretty non-functional Racket->JVM FFI Chris K. Jester-Young @cky944 Days After