Combinatorics of Biomolecules C.M. Reidys Nankai University Center - PDF document

Combinatorics of Biomolecules C.M. Reidys Nankai University Center for Combinatorics, LMPC 1

Sequence - Structure - Mappings Evolutionary Dynamics Population - Dynamics Population - Support - Dynamics F IGURE 1. Evolutionary Dynamics Computational Biology Group at Nankai • sequence to structure maps • combinatorial representation of biomolecules • new generation folding algorithms of biomolecules

Sequences and Shapes F IGURE 2. The neutral network of a structure. Sequence space (right) and shape space (left) represented as lattices. We draw the edges between two sequences bold if they map into the one particular structure on the left. The two key properties of neutral nets are their connectivity and percolation. They allow sequences to move while maintaining a shape through sequence space.

Sequences and Shapes: Neutral Networks A D C B F IGURE 3. Neutral network. Sequence space is represented as lattice and the neutral net is an induced subgraph (bold edges). We label the pairs of sequences representing antipodal pairs by ( A , B ) and ( C , D ) . The two key properties of neutral nets are their connectivity and percolation. Theorem 1. Let Q n 2, λ n be the random graph consisting of Q n 2 -subgraphs, Γ n , induced 2 -vertex with independent probability λ n = 1 + χ n by selecting each Q n , where χ n = n a − 1 2 , where 0 < ǫ and 0 < a ≤ 1 . Then we have ǫ n n → ∞ P ( | C ( 1 ) n | ≥ κ a n a − 1 | Γ n | and C ( 1 ) ∃ κ a > 0; lim is unique ) = 1 . (0.1) n Christian M. Reidys Large components in random induced subgraphs of n-cubes Dis- crete Math. submitted, 2007.

C C a) U A U G C C G b) G U U G A U U A G C C G G C G C A A U G G U A C C U U A c) C G G A C G C G G C G U G U G C U U A G U A U G A U A U F IGURE 4. RNA secondary structure. Watson-Crick base-pairs (gray), tertiary contacts (black)

RNA secondary structures or better: 2 -noncrossing RNA F IGURE 5. RNA secondary structures. Diagram representation (top): the primary sequence, GAGAGCCUUUGGACCUCA , is drawn horizontally and its backbone bonds are ignored. All bonds are drawn in the upper halfplane and secondary structures have the property that no two arcs intersect and all arcs have minimum length 2 . Outer planar graph representation (bottom).

3 -noncrossing RNA structures F IGURE 6. k -noncrossing RNA structures. (a) secondary structure , (b) planar 3 -noncrossing RNA structure , (c) the smallest non-planar 3 - noncrossing structure Definition 1. An RNA structure (of pseudoknot type k − 2 ), S k , n , is a digraph in which all vertices have degree ≤ 1 , that does not contain a k -set of mutually intersecting arcs and 1 -arcs, i.e. arcs of the form ( i , i + 1 ) , respectively.

3 -noncrossing RNA structures: What is new? F IGURE 7. A 3 -noncrossing RNA structure, as a planar graph (top) and as a diagram (bottom) F IGURE 8. The proposed SRV-1 frame-shift is a 10 -noncrossing RNA structure motif.

Combinatorics of 3 -noncrossing RNA structures Theorem 2. Let k ∈ N , k ≥ 2 , let f k ( n , ℓ ) be the number of k -noncrossing digraphs over n vertices with exactly ℓ isolated vertices. Then the number of RNA structures with ℓ isolated vertices, S k ( n , ℓ ) , is ( n − ℓ ) /2 � n − b � ( − 1 ) b ∑ S k ( n , ℓ ) = f k ( n − 2 b , ℓ ) . (0.2) b b = 0 Furthermore the number of k -noncrossing RNA structures, S k ( n ) is given by ⌊ n /2 ⌋ � � � n − 2 b � n − b ( − 1 ) b ∑ ∑ S k ( n ) = f k ( n − 2 b , ℓ ) (0.3) b b = 0 ℓ = 0 Emma Y. Jin, Jing Qin and Christian M. Reidys Combinatorics of RNA Structures with Pseudoknots , Bulletin of Math. Bio., 2007, in press. n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 S 3 ( n ) 1 1 2 5 13 36 105 321 1018 3334 11216 38635 135835 486337 1769500 Table 1. The first 15 numbers of 3 -noncrossing RNA structures.

Combinatorics of 3 -noncrossing RNA structures: Main idea F IGURE 9. A 5 -noncrossing structure corresponding to the oscillating tableau below and subsequently the corresponding walk γ a , a in Z 4 . ∅ 1 1 2 1 2 1 2 1 2 1 2 1 2 1 3 3 3 3 3 3 5 5 5 5 7 7 3 3 10 3 10 5 10 5 10 5 10 7 10 7 ∅ 5 5 5 7 7 7 7 7 7

Why 3 -noncrossing RNA structures is so different: recursions Corollary 1. The number of RNA secondary structures having exactly ℓ isolated vertices, S 2 ( n , ℓ ) , is given by �� n + ℓ n + ℓ 2 − 1 � � 2 2 (0.4) S 2 ( n , ℓ ) = . n − ℓ n − ℓ n − ℓ 2 + 1 2 − 1 Furthermore S 2 ( n , ℓ ) satisfies the recursion ( n − ℓ )( n − ℓ + 2 ) · S 2 ( n , ℓ ) − ( n + ℓ )( n + ℓ − 2 ) · S 2 ( n − 2, ℓ ) = 0 . (0.5) Corollary 2. The number of 3 -noncrossing RNA structures having exactly ℓ isolated vertices, S 3 ( n , ℓ ) , satisfies the 4 -term recursion (0.6) p 1 ( n , ℓ ) S 3 ( n − 6, ℓ ) − p 2 ( n , ℓ ) S 3 ( n − 4, ℓ ) − p 3 ( n , ℓ ) S 3 ( n − 2, ℓ ) + p 4 ( n , ℓ ) S 3 ( n , ℓ ) = 0 , where the coefficients p 1 ( n , ℓ ) , p 2 ( n , ℓ ) p 3 ( n , ℓ ) and p 4 ( n , ℓ ) are given by 1 p 1 ( n , ℓ ) = 2 n ( n − 1 )( n − 10 + ℓ )( n − 4 + ℓ )( n − 8 + ℓ ) 1 2 n ( n − 3 )( 13 n 3 − 126 n 2 + 13 n 2 ℓ − 88 n ℓ + 392 n + 3 n ℓ 2 + 216 ℓ − 384 − 42 ℓ 2 + 3 ℓ 3 ) p 2 ( n , ℓ ) = ( n − 1 )( 1 2 n − 2 )( 13 n 3 − 30 n 2 − 13 n 2 ℓ + 8 n + 16 n ℓ + 3 n ℓ 2 + 30 ℓ 2 − 72 ℓ − 3 ℓ 3 ) p 3 ( n , ℓ ) = ( n − 3 )( 1 p 4 ( n , ℓ ) = 2 n − 2 )( n − ℓ )( n − ℓ + 6 )( n − ℓ + 4 ) .

Asymptotic numbers of 3 -noncrossing RNA structures 140 120 2−noncrossing S 2 (n) r (n) restricted 3−noncrossing S 3 100 3−noncrossers S 3 (n) 80 lnx 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 x F IGURE 10. The numbers of RNA structures for large n . 2 -noncrossing RNA structures, 3 -noncrossing RNA structures and restricted 3 -noncrossing RNA Numerically exponential growth rates: S 2 ( n ) ∼ 2.5913 n ( n = structures. 1000 ), S 3 ( n ) ∼ 4.6542 n ( n = 1000 ), and S ( r ) 3 ( n ) ∼ 4.2741 n ( n = 400 ).

Asymptotic Combinatorics: Toroidal Harmonics F IGURE 11. Toroidal harmonics and its singular expansion. We display the analytic continuation of ∑ n ≥ 0 S 3 ( n ) z n , the generating function of 3 - noncrossing RNA structures (left) and its singular expansion (right) at the √ dominant singularity ρ 3 = 5 − 21 . 2

Asymptotic Combinatorics: Toroidal Harmonics Lemma 1. Let z be an indeterminant over R and w ∈ R a parameter. Let furthermore ρ k ( w ) denote the radius of convergence of the power series ∑ n ≥ 0 [ ∑ h ≤ n /2 S k ( n , h ) w 2 h ] z n . Then for | z | < ρ k ( w ) holds � 2 n � 1 wz k ( n , h ) w 2 h z n = S ′ n ≥ 0 ∑ ∑ w 2 z 2 − z + 1 ∑ (0.7) f k ( 2 n , 0 ) . w 2 z 2 − z + 1 n ≥ 0 h ≤ n /2 In particular we have for w = 1 , � 2 n � z 1 S k ( n ) z n = ∑ z 2 − z + 1 ∑ f k ( 2 n , 0 ) (0.8) . z 2 − z + 1 n ≥ 0 n ≥ 0 Theorem 3. The number of 3 -noncrossing RNA structures is asymptotically given by √ � n � 10.4724 · 4! 5 + 21 S 3 ( n ) ∼ . n ( n − 1 ) . . . ( n − 4 ) 2 Emma Y. Jin and Christian M. Reidys Asymptotics of RNA Structures with Pseudoknots , Bul- letin of Math. Bio., 2007, accepted.

Central and Local Limit Theorems for RNA structures F IGURE 12. Central limit theorem and local limit theorem for 3-noncrossing RNA structures of length n = 100 with exactly h arcs: we display the central limit theorem (left) for S ′ 3 ( 100, h ) , h = 1, 2, · · · 50 (labeled by red dots) with mean 0.39089 · 100 = 39.089 and variance 0.041565 · 100 = 4.1565 , and for the local limit theorem (right), we display the difference √ 2 π e − x 2 � � X n − 39.089 1 2 which is maximal close to the peak 4.1565 P √ = x − √ 4.1565 of the distribution.

Central and Local Limit Theorems for RNA structures Theorem 4. (Central Limit Theorem) Let S ′ 3 ( n , h ) be the number of 3 -noncrossing RNA structures with exactly h arcs. Let X n be the r.v. having the distribution P ( X n = h ) = S ′ 3 ( n , h ) ∀ h = 0, 1, . . . ⌊ n 2 ⌋ , (0.9) S 3 ( n ) Then the random variable X n − µ n √ σ 2 n has asymptotically normal distribution with parameter ( 0, 1 ) , i.e. � x � X n − µ n � 1 − ∞ e − 1 2 t 2 dt √ √ lim < x = (0.10) n → ∞ P σ 2 n 2 π and µ , σ 2 are given by √ √ µ = −− 3 2 + 13 σ 2 = µ 2 − 1 − 94 21 21 42 441 √ = 0.39089 = 0.041565 . (0.11) and √ 2 − 1 5 5 − 21 21 2 2 Theorem 5. (Local Limit Theorem) Let S ′ 3 ( n , h ) be the number of 3 -noncrossing RNA structures with exactly h arcs. Let X n be the r.v. having the distribution P ( X n = h ) = S ′ 3 ( n , h ) ∀ h = 0, 1, . . . ⌊ n 2 ⌋ , (0.12) S 3 ( n ) Then we have for set S = { x | x = o ( √ n ) } � X n − n µ √ � � � 1 e − x 2 � σ 2 n P √ √ � n → ∞ sup lim = x − � = 0 , (0.13) 2 � � σ 2 n 2 π � x ∈ S where µ = 0.39089 and σ 2 = 0.041565 . Emma Y. Jin and Christian M. Reidys Central and Local Limit Theorems of RNA Stuctures , Journal of theor. Bio., 2007, submitted

Combinatorics of Biomolecules C.M. Reidys Nankai University Center - PDF document

Combinatorics of Biomolecules C.M. Reidys Nankai University Center for Combinatorics, LMPC 1 Sequence - Structure - Mappings Evolutionary Dynamics Population - Dynamics Population - Support - Dynamics F IGURE 1. Evolutionary Dynamics

1. Early combinatorics Robin Wilson 1. Early combinatorics 2. European combinatorics: Middle

LCIO, Marlin, Mokka & Druid Manqi 11/08/2014 G4-Mokka Training@NanKai U 1 Foreword: This

An introduction to infinite time decidable equivalence relation theory Peng Cheng Nankai

Combinatorics under Determinacy Jared Holshouser University of North Texas Ohio University 2016

Combinatorics on Words through the Word-Equations-lens Florin Manea Georg-August-Universitt

Analytic Combinatorics in Several Variables Robin Pemantle and Mark Wilson A of A conference, 30

The Beauty of Combinatorics November 1, 2012 () The Beauty of Combinatorics November 1, 2012 1

Combinatorics of the Double-Dimer Model Helen Jenne University of Oregon Dimers in Combinatorics

Smith Normal Form and Combinatorics Richard P . Stanley Smith Normal Form and Combinatorics

Computer Algebra for Lattice Path Combinatorics Alin Bostan FPSAC 2019 Ljubljana, Slovenia

Combinat orics Definition 1 (Combinatorics). Combinatorics is the science of counting. Theorem 1

Combinatorics in Hungary and Extremal Set Theory Gyula O.H. Katona R enyi Institute, Budapest

Flow polytopes in combinatorics and algebra Karola M esz aros Cornell University Triangle

Combinatorics (2.6) The Birthday Problem (2.7) Prof. Tesler Math 186 Winter 2020 Prof. Tesler

5. Analytic Combinatorics http://aofa.cs.princeton.edu Analytic combinatorics is a calculus for

Computer Algebra for Lattice Path Combinatorics Alin Bostan AofA CIRM, Luminy, France June 27,

Probability and Statistics for Computer Science Can we call the e exci-ng ? e

Laboratorio de Ciberseguridad Probability, Random Processes and Inference Dr. Ponciano Jorge

A functional central limit theorem for branching random walks, with applications to Quicksort

6 7 9 2 8 9 7 4 6 8 1 2 1 8 7 2 1 2 8 2 5 7 1 9 2 4 7 8 8 9 3 4 1 4 2 9 6 6 2 4 7

Method of cumulants and mod-Gaussian convergence of the graphon models Pierre-Loc Mliot

Central Toronto Integrated Regional Resource Plan Meeting Torontos Electricity Needs for the

Borrego Valley Groundwater Basin Borrego Springs Subbasin Chapters 1-5 Draft Groundwater

Lecture 8: Exploration CS234: RL Emma Brunskill Spring 2017 Much of the content for this

Sambuz

Useful Links

Newsletter

Mail Us

Combinatorics of Biomolecules C.M. Reidys Nankai University Center - PDF document

Combinatorics of Biomolecules C.M. Reidys Nankai University Center for Combinatorics, LMPC 1 Sequence - Structure - Mappings Evolutionary Dynamics Population - Dynamics Population - Support - Dynamics F IGURE 1. Evolutionary Dynamics

1. Early combinatorics Robin Wilson 1. Early combinatorics 2. European combinatorics: Middle

LCIO, Marlin, Mokka &amp; Druid Manqi 11/08/2014 G4-Mokka Training@NanKai U 1 Foreword: This

An introduction to infinite time decidable equivalence relation theory Peng Cheng Nankai

Combinatorics under Determinacy Jared Holshouser University of North Texas Ohio University 2016

Combinatorics on Words through the Word-Equations-lens Florin Manea Georg-August-Universitt

Analytic Combinatorics in Several Variables Robin Pemantle and Mark Wilson A of A conference, 30

The Beauty of Combinatorics November 1, 2012 () The Beauty of Combinatorics November 1, 2012 1

Combinatorics of the Double-Dimer Model Helen Jenne University of Oregon Dimers in Combinatorics

Smith Normal Form and Combinatorics Richard P . Stanley Smith Normal Form and Combinatorics

Computer Algebra for Lattice Path Combinatorics Alin Bostan FPSAC 2019 Ljubljana, Slovenia

Combinat orics Definition 1 (Combinatorics). Combinatorics is the science of counting. Theorem 1

Combinatorics in Hungary and Extremal Set Theory Gyula O.H. Katona R enyi Institute, Budapest

Flow polytopes in combinatorics and algebra Karola M esz aros Cornell University Triangle

Combinatorics (2.6) The Birthday Problem (2.7) Prof. Tesler Math 186 Winter 2020 Prof. Tesler

5. Analytic Combinatorics http://aofa.cs.princeton.edu Analytic combinatorics is a calculus for

Computer Algebra for Lattice Path Combinatorics Alin Bostan AofA CIRM, Luminy, France June 27,

Probability and Statistics for Computer Science Can we call the e exci-ng ? e

Laboratorio de Ciberseguridad Probability, Random Processes and Inference Dr. Ponciano Jorge

A functional central limit theorem for branching random walks, with applications to Quicksort

6 7 9 2 8 9 7 4 6 8 1 2 1 8 7 2 1 2 8 2 5 7 1 9 2 4 7 8 8 9 3 4 1 4 2 9 6 6 2 4 7

Method of cumulants and mod-Gaussian convergence of the graphon models Pierre-Loc Mliot

Central Toronto Integrated Regional Resource Plan Meeting Torontos Electricity Needs for the

Borrego Valley Groundwater Basin Borrego Springs Subbasin Chapters 1-5 Draft Groundwater

Lecture 8: Exploration CS234: RL Emma Brunskill Spring 2017 Much of the content for this

Sambuz

Useful Links

Newsletter

Mail Us

LCIO, Marlin, Mokka & Druid Manqi 11/08/2014 G4-Mokka Training@NanKai U 1 Foreword: This