inferring descriptive generalisations of formal languages
play

Inferring Descriptive Generalisations of Formal Languages Dominik D. - PowerPoint PPT Presentation

Inferring Descriptive Generalisations of Formal Languages Dominik D. Freydenberger 1 Daniel Reidenbach 2 1 Goethe University, Frankfurt 2 Loughborough University, Loughborough COLT 2010 D. Freydenberger, D. Reidenbach Inferring Descriptive


  1. Inferring Descriptive Generalisations of Formal Languages Dominik D. Freydenberger 1 Daniel Reidenbach 2 1 Goethe University, Frankfurt 2 Loughborough University, Loughborough COLT 2010 D. Freydenberger, D. Reidenbach Inferring Descriptive Generalisations of Formal Languages 1

  2. Introduction Introduction Our goal: Learning patterns common to a set of strings. pattern : word consisting of terminals ( ∈ Σ ) and variables ( ∈ X ) Pat Σ := (Σ ∪ X ) + : set of all patterns over Σ substitution : terminal-preserving morphism σ : Pat Σ → Σ ∗ ( ∀ a ∈ Σ : σ ( a ) = a ) language of a pattern α ∈ Pat Σ : set of all images of α under substitutions (write: L ( α ) ) Example { v a w v | v, w ∈ Σ + } , L NE , Σ ( x a y x ) = { v a w v | v, w ∈ Σ ∗ } . L E , Σ ( x a y x ) = D. Freydenberger, D. Reidenbach Inferring Descriptive Generalisations of Formal Languages 2

  3. Introduction The classical model Identification in the limit of indexed families from positive data (Gold ’67) indexed family (of recursive languages) : L = ( L i ) i ∈ N , where w ∈ L i is uniformly decidable text of a language L : a total function t : N → Σ ∗ with { t ( i ) | i ∈ N } = L set of all texts of L : text( L ) L ∈ LIM-TEXT if there exists a computable function S such that, for every i and for every t ∈ text( L i ) , S ( t n ) converges to a j with L j = L i NE-patterns (yes, Angluin ’80) E-patterns (not if | Σ | ∈ { 2 , 3 , 4 } , Reidenbach ’06, ’08) terminal-free E-patterns (only if | Σ | � = 2 , Reidenbach ’06) D. Freydenberger, D. Reidenbach Inferring Descriptive Generalisations of Formal Languages 3

  4. Inferring descriptive generalisations Descriptive patterns Definition Let P Σ be a class of pattern languages over Σ . A pattern δ is P Σ -descriptive of a language L if L ( δ ) ∈ P Σ , 1 L ( δ ) ⊇ L , 2 there is no L ( γ ) ∈ P Σ with L ( δ ) ⊃ L ( γ ) ⊇ L . 3 We write: δ ∈ D P Σ ( L ) In other words: L ( δ ) is (one of) the closest generalisation(s) of L in P Σ , and δ is (one of) the best description(s) of L . Our approach: Learning of such generalisations. D. Freydenberger, D. Reidenbach Inferring Descriptive Generalisations of Formal Languages 4

  5. Inferring descriptive generalisations Inferring descriptive generalisations Definition Let P Σ be a class of pattern languages over Σ . Let L be a class of nonempty languages over Σ . L can be P Σ -descriptively generalised ( L ∈ DG P Σ ) if there is a computable function S such that, for every L ∈ L and for every t ∈ text( L ) , S ( t n ) converges to a δ ∈ D P Σ ( L ) . Main conceptual differences to LIM-TEXT : Infer generalisations instead of exact descriptions of the languages. Choose hypothesis space separate from language class. Interesting phenomenon: one language can have several descriptive patterns, one pattern can be descriptive of several languages. D. Freydenberger, D. Reidenbach Inferring Descriptive Generalisations of Formal Languages 5

  6. Inferring descriptive generalisations Characterisation theorem (for indexed families) Theorem Let Σ be an alphabet, let L = ( L i ) i ∈ N be an indexed family over Σ , and let P Σ be a class of pattern languages. L = ( L i ) i ∈ N ∈ DG P Σ if and only if there are effective procedures d and f satisfying the following conditions: (i) For every i ∈ N , there exists a δ d ( i ) ∈ D P Σ ( L i ) such that d enumerates a sequence of patterns d i, 0 , d i, 1 , d i, 2 , . . . satisfying, for all but finitely many j ∈ N , d i,j = δ d ( i ) . (ii) For every i ∈ N , f enumerates a finite set F i ⊆ L i such that, for every j ∈ N with F i ⊆ L j , if δ d ( i ) / ∈ D P Σ ( L j ) , then there is a w ∈ L j with w / ∈ L i . d is an enumeration of an appropriate subset of the hypothesis space f is similar to Angluin’s telltales D. Freydenberger, D. Reidenbach Inferring Descriptive Generalisations of Formal Languages 6

  7. Inferring descriptive generalisations Remarks Characterisation shows significant connection to Angluin’s characterisation of indexed families in LIM-TEXT . Main differences: our model requires an enumeration of a subset of the hypothesis space, 1 we do not need to distinguish all L i , L j with L i � = L j , 2 the strategy in our proof might discard a correct hypothesis. 3 Our strategy does not test membership or inclusion of pattern languages, but only membership for the indexed family. D. Freydenberger, D. Reidenbach Inferring Descriptive Generalisations of Formal Languages 7

  8. ePAT tf , Σ -descriptive patterns Further topics Further directions in our paper: 1 More general: Inductive inference with hypotheses validity relation (model HYP ). 2 Less general: Consider a smaller class of patterns and a fixed strategy. D. Freydenberger, D. Reidenbach Inferring Descriptive Generalisations of Formal Languages 8

  9. ePAT tf , Σ -descriptive patterns Inferring ePAT tf , Σ -descriptive patterns ePAT tf , Σ : The class of all E-pattern languages that are generated from terminalfree patterns. inclusion for ePAT tf , Σ is well understood and decidable. strategy Canon : For every finite set S , return the pattern δ ∈ D ePAT tf , Σ ( S ) that is minimal w.r.t. the length-lexicographical order. telling set of L : A finite set T ⊆ L with D ePAT tf , Σ ( T ) ∩ D ePAT tf , Σ ( L ) � = ∅ . Theorem Let Σ be an alphabet with | Σ | ≥ 2 . For every language L ⊆ Σ ∗ , and every text t ∈ text( L ) , Canon converges correctly on t if and only if L has a telling set. D. Freydenberger, D. Reidenbach Inferring Descriptive Generalisations of Formal Languages 9

  10. ePAT tf , Σ -descriptive patterns Telling set languages T SL Σ : the class of all languages over Σ that have a telling set T SL Σ ∈ DG ePAT tf , Σ , using Canon as strategy Some properties of T SL Σ : contains every DTF0L language ⇒ superfinite is not countable does not contain all of REG contains all ePAT tf , Σ -languages (if | Σ | � = 2 ) does not contain all ePAT tf , Σ -languages (if | Σ | = 2 ) D. Freydenberger, D. Reidenbach Inferring Descriptive Generalisations of Formal Languages 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend