Local Substitutability for Sequence Generalization Fran cois Coste , - - PowerPoint PPT Presentation

local substitutability for sequence generalization
SMART_READER_LITE
LIVE PREVIEW

Local Substitutability for Sequence Generalization Fran cois Coste , - - PowerPoint PPT Presentation

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Local Substitutability for Sequence Generalization Fran cois Coste , Ga elle Garet , Jacques Nicolas Dyliss


slide-1
SLIDE 1

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Local Substitutability for Sequence Generalization

Fran¸ cois Coste, Ga¨ elle Garet, Jacques Nicolas

Dyliss Bioinformatic Team Inria Rennes-Bretagne Atlantique France

ICGI, September 6, 2012

Local Substitutability for Sequence Generalization 1/24

slide-2
SLIDE 2

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Table of Contents

1

Biological Problem to Grammatical Inference

2

Generalization using Substitutability

3

Generalization using Local Substitutability

4

First Experiments

Local Substitutability for Sequence Generalization 2/24

slide-3
SLIDE 3

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Table of Contents

1

Biological Problem to Grammatical Inference

2

Generalization using Substitutability

3

Generalization using Local Substitutability

4

First Experiments

Local Substitutability for Sequence Generalization 3/24

slide-4
SLIDE 4

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Prediction of Protein Function Protein:

Amino acid sequence : length ≈ 500, alphabet of size 20

KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTM

Structure : determined by sequence Function : largely dependent on structure A lot of sequences available (sequencing projects) = ⇒ Find the protein’s function from its sequence

Local Substitutability for Sequence Generalization 4/24

slide-5
SLIDE 5

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Characterization of a Protein Functional Family

Usual representations:

Sub-regular expressions, profiles, ...

Proteins:

short term interactions long term interactions

Local Substitutability for Sequence Generalization 5/24

slide-6
SLIDE 6

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Characterization of a Protein Functional Family

Usual representations:

Sub-regular expressions, profiles, ...

Proteins:

short term interactions long term interactions

KETAAAKFERQHMDSSTSAASSSNYCN- QMMKSRNL...

alpha helix beta sheet

Local Substitutability for Sequence Generalization 5/24

slide-7
SLIDE 7

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Characterization of a Protein Functional Family

Usual representations:

Sub-regular expressions, profiles, ...

Proteins:

short term interactions: automata[Ker08] long term interactions

KETAAAKFERQHMDSSTSAASSSNYCN- QMMKSRNL...

alpha helix beta sheet

Local Substitutability for Sequence Generalization 5/24

slide-8
SLIDE 8

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Characterization of a Protein Functional Family

Usual representations:

Sub-regular expressions, profiles, ...

Proteins:

short term interactions long term interactions

KETAAAKFERQHMDSSTSAASSSNYCN- QMMKSRNL... Local Substitutability for Sequence Generalization 5/24

slide-9
SLIDE 9

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Characterization of a Protein Functional Family

Usual representations:

Sub-regular expressions, profiles, ...

Proteins:

short term interactions long term interactions

KETAAAKFERQHMDSSTSAASSSNYCN- QMMKSRNL...

Abstraction

Context free grammars enable modeling important protein contacts.

Issue

How to infer such CFG from a set of protein sequences?

Local Substitutability for Sequence Generalization 5/24

slide-10
SLIDE 10

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Protomata-inspired Approach

Detection of blocks of conservation by partial local multiple alignment[Ker08]

Seq1 Seq2 Seq3 Seq4 Seq5 SVSLD TVSYD HVSAT NVSTT SVSAT IDLQTVLPEWVRVGFSASTG VDLKTELPEWVRVGFSGSTG VPLEKEVEDWVSVGFSATSG VELEKEVYDWVSVGFSATSG VHLEKEVDEWVSVGFSATSG QNV GYV SKKETT AYQWSY LTEDTT ERNSILAWSFSS QNHNILSWTFNS ETHNVLSWSFSS ETHDVLSWSFSS ETHDVLSWSFSS

Recoding sequences with conservation blocks

Seq1 Block1 Block2 Block3 Block4 Seq2 Block1 Block2 Block3 Block4 Seq3 Block5 Block2 Block6 Block4 Seq4 Block5 Block2 Block6 Block4 Seq5 Block5 Block2 Block6 Block4

Grammar induced by recoding

S − → Block1 Block2 Block3 Block4 | Block5 Block2 Block6 Block4 Block1 − → P1 P2 P3 P4 P5 P1 − → S | T ...

How to generalize more?

Local Substitutability for Sequence Generalization 6/24

slide-11
SLIDE 11

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Protomata-inspired Approach

Detection of blocks of conservation by partial local multiple alignment[Ker08]

Seq1 Seq2 Seq3 Seq4 Seq5 SVSLD TVSYD HVSAT NVSTT SVSAT IDLQTVLPEWVRVGFSASTG VDLKTELPEWVRVGFSGSTG VPLEKEVEDWVSVGFSATSG VELEKEVYDWVSVGFSATSG VHLEKEVDEWVSVGFSATSG QNV GYV SKKETT AYQWSY LTEDTT ERNSILAWSFSS QNHNILSWTFNS ETHNVLSWSFSS ETHDVLSWSFSS ETHDVLSWSFSS

Recoding sequences with conservation blocks

Seq1 Block1 Block2 Block3 Block4 Seq2 Block1 Block2 Block3 Block4 Seq3 Block5 Block2 Block6 Block4 Seq4 Block5 Block2 Block6 Block4 Seq5 Block5 Block2 Block6 Block4

Grammar induced by recoding

S − → Block1 Block2 Block3 Block4 | Block5 Block2 Block6 Block4 Block1 − → P1 P2 P3 P4 P5 P1 − → S | T ...

How to generalize more?

Local Substitutability for Sequence Generalization 6/24

slide-12
SLIDE 12

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Protomata-inspired Approach

Detection of blocks of conservation by partial local multiple alignment[Ker08]

Seq1 Seq2 Seq3 Seq4 Seq5 SVSLD TVSYD HVSAT NVSTT SVSAT IDLQTVLPEWVRVGFSASTG VDLKTELPEWVRVGFSGSTG VPLEKEVEDWVSVGFSATSG VELEKEVYDWVSVGFSATSG VHLEKEVDEWVSVGFSATSG QNV GYV SKKETT AYQWSY LTEDTT ERNSILAWSFSS QNHNILSWTFNS ETHNVLSWSFSS ETHDVLSWSFSS ETHDVLSWSFSS

Recoding sequences with conservation blocks

Seq1 Block1 Block2 Block3 Block4 Seq2 Block1 Block2 Block3 Block4 Seq3 Block5 Block2 Block6 Block4 Seq4 Block5 Block2 Block6 Block4 Seq5 Block5 Block2 Block6 Block4

Grammar induced by recoding

S − → Block1 Block2 Block3 Block4 | Block5 Block2 Block6 Block4 Block1 − → P1 P2 P3 P4 P5 P1 − → S | T ...

How to generalize more?

Local Substitutability for Sequence Generalization 6/24

slide-13
SLIDE 13

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Table of Contents

1

Biological Problem to Grammatical Inference

2

Generalization using Substitutability

3

Generalization using Local Substitutability

4

First Experiments

Local Substitutability for Sequence Generalization 7/24

slide-14
SLIDE 14

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Substitutability[Har54] Based Inference

[CE07]: substitutable languages ∀y1, y2 ∈ Σ+ : [∃x1, z1 : x1y1z1 ∈ L ∧ x1y2z1 ∈ L] ⇒ [∀x2, z2 : x2y1z2 ∈ L ⇔ x2y2z2 ∈ L]

Two strings occurring between common left and right contexts are substitutable.

[Yos08]: (k,l)-substitutable languages ∀y1, y2 ∈ Σ+, ∀u, v ∈ Σk, Σl : [∃x1, z1 : x1uy1vz1 ∈ L ∧ x1uy2vz1 ∈ L] ⇒ [∀x2, z2 : x2uy1vz2 ∈ L ⇔ x2uy2vz2 ∈ L]

Two strings occurring between common left and right contexts are substitutable in these left and right sub-contexts of length k and l.

Local Substitutability for Sequence Generalization 8/24

slide-15
SLIDE 15

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Substitutability[Har54] Based Inference

[CE07]: substitutable languages ∀y1, y2 ∈ Σ+ : [∃x1, z1 : x1y1z1 ∈ L ∧ x1y2z1 ∈ L] ⇒ [∀x2, z2 : x2y1z2 ∈ L ⇔ x2y2z2 ∈ L]

Two strings occurring between common left and right contexts are substitutable.

[Yos08]: (k,l)-substitutable languages ∀y1, y2 ∈ Σ+, ∀u, v ∈ Σk, Σl : [∃x1, z1 : x1uy1vz1 ∈ L ∧ x1uy2vz1 ∈ L] ⇒ [∀x2, z2 : x2uy1vz2 ∈ L ⇔ x2uy2vz2 ∈ L]

Two strings occurring between common left and right contexts are substitutable in these left and right sub-contexts of length k and l.

Local Substitutability for Sequence Generalization 8/24

slide-16
SLIDE 16

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Preliminary Experiments on Protein Sequences

Unsatisfactory results No generalization Precision Recall F-measure Before substitutability 1 0.2 0.33 After substitutability 1 0.2 0.33 Analysis of failure causes Training sequences are long (Global) Contexts of two strings are never identical How to generalize more? Our solution : Introduction of local substitutability new classes of languages new generalization criterion

Local Substitutability for Sequence Generalization 9/24

slide-17
SLIDE 17

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Preliminary Experiments on Protein Sequences

Unsatisfactory results No generalization Precision Recall F-measure Before substitutability 1 0.2 0.33 After substitutability 1 0.2 0.33 Analysis of failure causes Training sequences are long (Global) Contexts of two strings are never identical How to generalize more? Our solution : Introduction of local substitutability new classes of languages new generalization criterion

Local Substitutability for Sequence Generalization 9/24

slide-18
SLIDE 18

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Preliminary Experiments on Protein Sequences

Unsatisfactory results No generalization Precision Recall F-measure Before substitutability 1 0.2 0.33 After substitutability 1 0.2 0.33 Analysis of failure causes Training sequences are long (Global) Contexts of two strings are never identical How to generalize more? Our solution : Introduction of local substitutability new classes of languages new generalization criterion

Local Substitutability for Sequence Generalization 9/24

slide-19
SLIDE 19

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Table of Contents

1

Biological Problem to Grammatical Inference

2

Generalization using Substitutability

3

Generalization using Local Substitutability

4

First Experiments

Local Substitutability for Sequence Generalization 10/24

slide-20
SLIDE 20

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

(k, l)-Local Substitutability

(k, l)-local substitutable languages ∀y1, y2 ∈ Σ+ : [∃r, s ∈ Σk, Σl : x1ry1sz1 ∈ L ∧ x2ry2sz2 ∈ L] ⇒ [∀x3, z3 : x3y1z3 ∈ L ⇔ x3y2z3 ∈ L]

Definition

Two strings occurring between common left and right contexts

  • f length k and l are substitutable.

Local Substitutability for Sequence Generalization 11/24

slide-21
SLIDE 21

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

(k, l)-Local-Context Substitutability

(k, l)-local context substitutable languages ∀y1, y2 ∈ Σ+, ∀u, v ∈ Σk, Σl : [x1uy 1vz1 ∈ L ∧ x2uy2vz2 ∈ L] ⇒ [∀x3, z3 : x3uy1vz3 ∈ L ⇔ x3uy2vz3 ∈ L]

Definition

Two strings occurring between common left and right contexts

  • f length k and l are substitutable in these contexts of length k

and l.

Local Substitutability for Sequence Generalization 12/24

slide-22
SLIDE 22

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Generalization of Sequences: Example

Set of sequences :

I have arrived after midnight. I have driven after midnight. She has arrived before me. Marie has eaten before him. To obtain a language, we must add the following sequences : She has driven before me. She has eaten before me. Marie has arrived before him. Marie has driven before him. I have eaten after midnight.

Local Substitutability for Sequence Generalization 13/24

slide-23
SLIDE 23

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Generalization of Sequences: Example

Set of sequences :

I have arrived after midnight. I have driven after midnight. She has arrived before me. Marie has eaten before him. To obtain a substitutable language, we must add the following sequences : She has driven before me. She has eaten before me. Marie has arrived before him. Marie has driven before him. I have eaten after midnight.

Local Substitutability for Sequence Generalization 13/24

slide-24
SLIDE 24

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Generalization of Sequences: Example

Set of sequences :

I have arrived after midnight. I have driven after midnight. She has arrived before me. Marie has eaten before him. To obtain a (1,1) substitutable language, we must add the following sequences : She has driven before me. She has eaten before me. Marie has arrived before him. Marie has driven before him. I have eaten after midnight.

Local Substitutability for Sequence Generalization 13/24

slide-25
SLIDE 25

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Generalization of Sequences: Example

Set of sequences :

I have arrived after midnight. I have driven after midnight. She has arrived before me. Marie has eaten before him. To obtain a (1,1) context local substitutable language, we must add the following sequences : She has driven before me. She has eaten before me. Marie has arrived before him. Marie has driven before him. I have eaten after midnight.

Local Substitutability for Sequence Generalization 13/24

slide-26
SLIDE 26

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Generalization of Sequences: Example

Set of sequences :

I have arrived after midnight. I have driven after midnight. She has arrived before me. Marie has eaten before him. To obtain a (1,1) local substitutable language, we must add the following sequences : She has driven before me. She has eaten before me. Marie has arrived before him. Marie has driven before him. I have eaten after midnight.

Local Substitutability for Sequence Generalization 13/24

slide-27
SLIDE 27

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Links between Substitutable Languages

Two Complementary Usages of Contexts Language local definition contextual application substitutable [CE07] (∞, ∞) (0, 0) k, l-context substitutable [Yos08] (∞, ∞) (k, l) k, l-local substitutable (k, l) (0, 0) k, l-local context substitutable (k, l) (k, l) i, j-local k, l-context substitutable (i, j) (k, l) Inclusion of substitutable language classes k,l-subst. k,l-local context subst. subst. k,l-local subst.

Local Substitutability for Sequence Generalization 14/24

slide-28
SLIDE 28

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Hierarchy of (i,j)-Local(k,l)-Context Substitutable Languages

(i, j)(k, l) (i+1, j)(k, l) (i+2, j)(k, l) (i+3, j)(k, l) ...

Local Substitutability for Sequence Generalization 15/24

slide-29
SLIDE 29

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Hierarchy of (i,j)-Local(k,l)-Context Substitutable Languages

(i + 1, j + 1)(k + 1, l + 1) ... Context-free {Σ∗}

(0, 0)(0, 0)

(i, j)(k, l) (i, j)(k + 1, l) (i, j)(k, l + 1) (i, j + 1)(k, l) (i + 1, j)(k, l)

Local Substitutability for Sequence Generalization 16/24

slide-30
SLIDE 30

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Hierarchy of (i,j)-Local(k,l)-Context Substitutable Languages

(i + 1, j + 1)(k + 1, l + 1) ... Context-free {Σ∗}

(0, 0)(0, 0)

(i, j)(k, l) (i, j)(k + 1, l) (i, j)(k, l + 1) (i, j)(k + 1, l + 1) (i, j + 1)(k, l) (i + 1, j)(k, l) (i + 1, j + 1)(k, l)

Local Substitutability for Sequence Generalization 16/24

slide-31
SLIDE 31

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Local Substitutability and Testability

Languages Extension of Substitutable 0-reversible k,l-substitutable k-reversible k,l-context local substitutable k-testable

Reversible language ∀y1, y2 ∈ Σ+ : [∃x1 : x1y1 ∈ L ∧ x1y2 ∈ L] ⇒ [∀x2 : x2y1 ∈ L ⇔ x2y2 ∈ L] Substitutable language ∀y1, y2 ∈ Σ+ : [∃x1, z1 : x1y1z1 ∈ L ∧ x1y2z1 ∈ L] ⇒ [∀x2, z2 : x2y1z2 ∈ L ⇔ x2y2z2 ∈ L]

Local Substitutability for Sequence Generalization 17/24

slide-32
SLIDE 32

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Local Substitutability and Testability

Languages Extension of Substitutable 0-reversible k,l-substitutable k-reversible k,l-context local substitutable k-testable

k-reversible language ∀y1, y2 ∈ Σ+, ∀u ∈ Σk : [∃x1 : x1uy1 ∈ L ∧ x1uy2 ∈ L] ⇒ [∀x2 : x2uy1 ∈ L ⇔ x2uy2 ∈ L] k, l-substitutable language ∀y1, y2 ∈ Σ+, ∀u ∈ Σk, v ∈ Σl : [∃x1, z1 : x1uy1vz1 ∈ L ∧ x1uy2vz1 ∈ L] ⇒ [∀x2, z2 : x2uy1vz2 ∈ L ⇔ x2uy2vz2 ∈ L]

Local Substitutability for Sequence Generalization 17/24

slide-33
SLIDE 33

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Local Substitutability and Testability

Languages Extension of Substitutable 0-reversible k,l-substitutable k-reversible k,l-context local substitutable k-testable

k-testable language ∀y1, y2 ∈ Σ+∀u ∈ Σk : [x1uy1 ∈ L ∧ x2uy2 ∈ L] ⇒ [∀x3 : x3uy2 ∈ L ⇔ x3uy1 ∈ L] k, l-local context substitutable language ∀y1, y2 ∈ Σ+, ∀u, v ∈ Σk, Σl : [x1uy1vz1 ∈ L ∧ x2uy2vz2 ∈ L] ⇒ [∀x3, z3 : x3uy1vz3 ∈ L ⇔ x3uy2vz3 ∈ L]

Local Substitutability for Sequence Generalization 17/24

slide-34
SLIDE 34

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Table of Contents

1

Biological Problem to Grammatical Inference

2

Generalization using Substitutability

3

Generalization using Local Substitutability

4

First Experiments

Local Substitutability for Sequence Generalization 18/24

slide-35
SLIDE 35

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Learning Algorithm: k, l-local substitutability

ˆ GLS

Input : Set of sequences K, parameters k and l Output : Grammar ˆ G = ΣK, VK, PK, S Non-terminals definition VK = {[y] | xyz ∈ K, y = λ} ∪ {S} Induction of rules Initial rules PK = {S → [w] | w ∈ K} Terminal rules ∪ {[a] → a | a ∈ Σ} Branching rules ∪ {[xy] → [x][y] | [xy], [x], [y] ∈ VK } Substitutability rules ∪ {[y1] → [y2] | x1uy1vz1 ∈ K, x2uy2vz2 ∈ K

  • the local definition context

, | u |= k, | v |= l}

Local Substitutability for Sequence Generalization 19/24

slide-36
SLIDE 36

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Learning Algorithm : k, l-local context substitutability

ˆ GLCS

Input : Set of sequences K, parameters k and l Output : Grammar ˆ G = ΣK, VK, PK, S Non-terminals definition VK = {[y] | xyz ∈ K, y = λ} ∪ {S} Induction of rules Initial rules PK = {S → [w] | w ∈ K} Terminal rules ∪ {[a] → a | a ∈ Σ} Branching rules ∪ {[xy] → [x][y] | [xy], [x], [y] ∈ VK } Substitutability rules ∪ { [uy1v] → [uy2v]

  • the application context

| x1uy1vz1 ∈ K, x2uy2vz2 ∈ K

  • the local definition context

, | u |= k, | v |= l}

Local Substitutability for Sequence Generalization 19/24

slide-37
SLIDE 37

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Experiments

PS00307 Family[DN09] Training set of annotated sequences (22)

...TVSLDIDLQTVLPEWVRVGFSASTGQNVERNSILAWSFSS... ...TVSYDVDLKTELPEWVRVGFSGSTGGYVQNHNILSWTFNS... ...HVSATVKEVEDWVSVGFSATSGSKKETTETHNVLSWSFSS... ...NVSTTVKEVYDWVSVGFSATSGAYQWSYETHDVLSWSFSS... ...SVSATVKEVDEWVSVGFSATSGLTEDTTETHDVLSWSFSS...

... Recoding sequences ...Block1 Block2 Block3 Block4... ...Block1 Block2 Block3 Block4... ...Block5 Block2 Block6 Block4... ...Block5 Block2 Block6 Block4... ...Block5 Block2 Block6 Block4... ... Grammar Preprocessing Algorithm Application Test set (20) Positive test set (10) Negative test set (10) Recognition rate

Local Substitutability for Sequence Generalization 20/24

slide-38
SLIDE 38

Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments

Results

Generalization criterion Precision Recall F-measure Substitutability 1 0.2 0.33 4,4 - Local context substitutability 1 0.6 0.75 4-4 - Local substitutability 1 0.7 0.82 Stochastic CFG[DN09] 1 0.1 0.18 (with different thresholds) 0.3 1 0.46 0.8 0.9 0.85 Good generalization and still specific = ⇒ First encouraging results!

Local Substitutability for Sequence Generalization 21/24

slide-39
SLIDE 39

Conclusion

Introduction of local substitutability

extension of k-testability for context-free new classes of language new generalization criteria Application on proteins first encouraging results

◮ more practical (heuristic) algorithms ◮ parsing efficiency

Learnability of language classes implied by learnability results of [Yos08]

◮ better learnability results for local substitutable classes? Local Substitutability for Sequence Generalization 22/24

slide-40
SLIDE 40

Questions?

Local Substitutability for Sequence Generalization 23/24

slide-41
SLIDE 41

References

[CE07]

  • A. Clark and R. Eyraud.

Polynomial identification in the limit of substitutable context-free languages. Journal of Machine Learning Research, 8:1725–1745, August 2007. [DN09] Witold Dyrka and Jean C. Nebel. A stochastic context free grammar based framework for analysis of protein sequences. BMC Bioinformatics, 10(1):323+, October 2009. [Har54] Z. S. Harris. Distributional Structure. Word, (23):146–162, 1954. [Ker08] G. Kerbellec. Apprentissage d’automates mod´ elisant des familles de s´ equences prot´ eiques. PhD thesis, Universit´ e Rennes 1, 2008. [Yos08] R. Yoshinaka. Identification in the limit of (k,l)-substitutable context-free languages. In Proceedings of the 9th international colloquium conference on Grammatical inference: theoretical results and applications, ICGI’09, pages 266–279, 2008.

Local Substitutability for Sequence Generalization 24/24