The substitution decomposition of matchings and RNA secondary - - PowerPoint PPT Presentation

the substitution decomposition of matchings and rna
SMART_READER_LITE
LIVE PREVIEW

The substitution decomposition of matchings and RNA secondary - - PowerPoint PPT Presentation

The substitution decomposition of matchings and RNA secondary structures Aziza Jefferson and Vince Vatter University of Florida Permutation Patterns 2018 July 13, 2018 T HE B IOLOGY PP & MP T HE S UBSTITUTION D ECOMPOSITION A B IT


slide-1
SLIDE 1

The substitution decomposition of matchings and RNA secondary structures

Aziza Jefferson and Vince Vatter University of Florida Permutation Patterns 2018 July 13, 2018

slide-2
SLIDE 2

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

PREDICTING HOW RNA FOLDS

Problem: Given the primary structure, predict the secondary structure.

1 of 22

slide-3
SLIDE 3

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

PREDICTING HOW RNA FOLDS

1 of 22

slide-4
SLIDE 4

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

PREDICTING HOW RNA FOLDS

How about algorithms that only predict certain secondary struc- tures? There are oodles of them.

1 of 22

slide-5
SLIDE 5

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

ORTHODOX STRUCTURES

For a long time biologists thought the edges of the correspond- ing matchings could not cross. Secondary structures without crossings are called orthodox structures. We call those non-crossing matchings. Counted by the Catalan numbers. Crossings are called pseudoknots.

2 of 22

slide-6
SLIDE 6

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE D&P FAMILY

Matchings which can be reduced to empty by consecutively deleting edges according to three rules. The edge (i, j) can be deleted if it is

◮ a directly adjacent edge, i.e., j = i + 1, ◮ a directly nested edge, i.e., there exists an edge (i′, j′) such

that i′ = i + 1 and j′ = j − 1, or

◮ a “hairpin”, i.e., there exists an edge (i′, j′) where either

j′ = j + 1 = i′ + 2 = i + 3 or j = j′ + 1 = i + 2 = i′ + 3.

Dirks and Pierce. A partition function algorithm for nucleic acid secondary structure including pseudoknots. J Comput. Chem. 24 (2003), 1664–1677.

3 of 22

slide-7
SLIDE 7

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE D&P FAMILY

Counted by Saule et al. using the context-free language S → dSdS | P, P → pSXpS | ǫ, X → xSXxS | ySYyS, Y → ySYyS | ǫ.

Saule, R´ egnier, Steyaert, and Denise. Counting RNA pseudoknotted struc-

  • tures. J. Comput. Biol. 18, 10 (2011), 1339–1351.

4 of 22

slide-8
SLIDE 8

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

There must be a better way to...

◮ describe families and ◮ count them.

5 of 22

slide-9
SLIDE 9

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

There must be a better way to...

◮ describe families and ◮ count them.

There is!

5 of 22

slide-10
SLIDE 10

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

PP IS A SPECIAL CASE OF MP

slide-11
SLIDE 11

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

PP IS A SPECIAL CASE OF MP (LEARN MORE AT MP2019 IN Z ¨

URICH)

  • 1
  • 1
  • 2
  • 2
  • 3
  • 3
  • 1
  • 1
  • 2
  • 3
  • 3
  • 2
  • 1
  • 2
  • 2
  • 1
  • 3
  • 3

123 132 213

  • 1
  • 2
  • 2
  • 3
  • 3
  • 1
  • 1
  • 3
  • 2
  • 1
  • 3
  • 2
  • 1
  • 3
  • 2
  • 2
  • 3
  • 1

231 312 321 We’ll call these permutational matchings (following Jel´ ınek for the term but Bloom and Elizalde for the “backwards” convention). The permutational matchings are precisely those matchings that avoid .

6 of 22

slide-12
SLIDE 12

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

AVOIDING A SHORT PATTERN

◮ Av(

) — Catalan enumeration, very simple structure, poset is isomorphic to Av(231).

7 of 22

slide-13
SLIDE 13

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

AVOIDING A SHORT PATTERN

◮ Av(

) — Catalan enumeration, very simple structure, poset is isomorphic to Av(231).

◮ Av(

) — Catalan enumeration, more complicated structure, poset is not isomorphic to Av(321), but behaves quite a bit like it (especially w.r.t. all properties of Av(321) mentioned in Brignall’s talk) — see Albert and V (pre-preprint).

7 of 22

slide-14
SLIDE 14

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

AVOIDING A SHORT PATTERN

◮ Av(

) — Catalan enumeration, very simple structure, poset is isomorphic to Av(231).

◮ Av(

) — Catalan enumeration, more complicated structure, poset is not isomorphic to Av(321), but behaves quite a bit like it (especially w.r.t. all properties of Av(321) mentioned in Brignall’s talk) — see Albert and V (pre-preprint).

◮ Av(

) — n! enumeration, all of permutation patterns.

7 of 22

slide-15
SLIDE 15

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

AVOIDING A SHORT PATTERN

◮ Av(

) — Catalan enumeration, very simple structure, poset is isomorphic to Av(231).

◮ Av(

) — Catalan enumeration, more complicated structure, poset is not isomorphic to Av(321), but behaves quite a bit like it (especially w.r.t. all properties of Av(321) mentioned in Brignall’s talk) — see Albert and V (pre-preprint).

◮ Av(

) — n! enumeration, all of permutation patterns. Note: Av( ) is isomorphic as a poset to Av( , ) — the smallest-yet example of “unbalanced Wilf-equivalence” (Burstein and Pantone).

7 of 22

slide-16
SLIDE 16

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE SUBSTITUTION DECOMPOSITION

First famous use by Gallai in 1967. (Though the idea dates back to at least Fra¨ ıss´ e in 1953.)

  • Gallai. Transitiv orientierbare Graphen. Acta Math. Acad. Sci. Hungar 18 (1967), 25–66.

8 of 22

slide-17
SLIDE 17

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE SUBSTITUTION DECOMPOSITION

First famous use by Gallai in 1967. (Though the idea dates back to at least Fra¨ ıss´ e in 1953.)

  • Gallai. Transitiv orientierbare Graphen. Acta Math. Acad. Sci. Hungar 18 (1967), 25–66.

Also called modular decomposition, disjunctive decomposition, and X-join.

8 of 22

slide-18
SLIDE 18

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE SUBSTITUTION DECOMPOSITION

First famous use by Gallai in 1967. (Though the idea dates back to at least Fra¨ ıss´ e in 1953.)

  • Gallai. Transitiv orientierbare Graphen. Acta Math. Acad. Sci. Hungar 18 (1967), 25–66.

Also called modular decomposition, disjunctive decomposition, and X-join. Useful in a number of algorithmic contexts (though not yet explicitly in RNA secondary structure prediction).

  • hring. Algorithmic aspects of the substitution decomposition in optimization over relations, sets systems and

Boolean functions. Ann. Oper. Res. 4, 1-4 (1985), 195–225. 8 of 22

slide-19
SLIDE 19

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE SUBSTITUTION DECOMPOSITION

First famous use by Gallai in 1967. (Though the idea dates back to at least Fra¨ ıss´ e in 1953.)

  • Gallai. Transitiv orientierbare Graphen. Acta Math. Acad. Sci. Hungar 18 (1967), 25–66.

Also called modular decomposition, disjunctive decomposition, and X-join. Useful in a number of algorithmic contexts (though not yet explicitly in RNA secondary structure prediction).

  • hring. Algorithmic aspects of the substitution decomposition in optimization over relations, sets systems and

Boolean functions. Ann. Oper. Res. 4, 1-4 (1985), 195–225.

And also in the enumeration of permutation classes...

8 of 22

slide-20
SLIDE 20

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

INTERVALS AND SIMPLE PERMUTATIONS

  • Ovals enclose the intervals of this permutation. A permutation is

simple if all of its intervals are trivial (single entries or the whole permutation). Albert and Atkinson (2005). If a permutation class has only finitely many simple permutations then it...

◮ is defined by finitely many minimal forbidden permutations

(finite basis),

◮ does not contain an infinite antichain, and ◮ has an algebraic generating function.

Albert and Atkinson. Simple permutations and pattern restricted permutations. Discrete Math. 300, 1-3 (2005), 1–15. 9 of 22

slide-21
SLIDE 21

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

VERTEX MODULES (ANALOGUES OF INTERVALS)

  • 10 of 22
slide-22
SLIDE 22

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

VERTEX MODULES (ANALOGUES OF INTERVALS)

  • 11 of 22
slide-23
SLIDE 23

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

VERTEX MODULES (ANALOGUES OF INTERVALS)

  • Matchings without vertex modules are weakly indecomposable.

12 of 22

slide-24
SLIDE 24

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

EDGE MODULES (ALSO ANALOGUES OF INTERVALS?)

  • 13 of 22
slide-25
SLIDE 25

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

EDGE MODULES (ALSO ANALOGUES OF INTERVALS?)

  • 14 of 22
slide-26
SLIDE 26

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

EDGE MODULES (ALSO ANALOGUES OF INTERVALS?)

  • ◮ Note: we can only inflate an edge by a permutational

matching.

◮ Matchings without vertex or edge modules are strongly

indecomposable.

15 of 22

slide-27
SLIDE 27

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

COUNTING INDECOMPOSABLE MATCHINGS

A Variety of People (multiple times) have “counted” the simple permutations. Asymptotically, 1/e2 of all permutations are simple.

16 of 22

slide-28
SLIDE 28

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

COUNTING INDECOMPOSABLE MATCHINGS

A Variety of People (multiple times) have “counted” the simple permutations. Asymptotically, 1/e2 of all permutations are simple. Nijenhuis and Wilf (1979) counted the weakly indecomposable matchings (“connected linked diagrams” to them): wn = (n − 1)

n−1

  • i=1

wiwn−i.

16 of 22

slide-29
SLIDE 29

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

COUNTING INDECOMPOSABLE MATCHINGS

A Variety of People (multiple times) have “counted” the simple permutations. Asymptotically, 1/e2 of all permutations are simple. Nijenhuis and Wilf (1979) counted the weakly indecomposable matchings (“connected linked diagrams” to them): wn = (n − 1)

n−1

  • i=1

wiwn−i. Jefferson and V. Asymptotically, 1/e of all matchings are weakly indecomposable.

16 of 22

slide-30
SLIDE 30

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

COUNTING INDECOMPOSABLE MATCHINGS

A Variety of People (multiple times) have “counted” the simple permutations. Asymptotically, 1/e2 of all permutations are simple. Nijenhuis and Wilf (1979) counted the weakly indecomposable matchings (“connected linked diagrams” to them): wn = (n − 1)

n−1

  • i=1

wiwn−i. Jefferson and V. Asymptotically, 1/e of all matchings are weakly indecomposable. Jefferson and V. Asymptotically, 1/e2 of all matchings are strongly indecomposable.

16 of 22

slide-31
SLIDE 31

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE SAME PROPORTION...

But there are a lot more matchings than permutations.

17 of 22

slide-32
SLIDE 32

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE SAME PROPORTION...

But there are a lot more matchings than permutations. Simple permutations of length 5: 24153, 25314, 31524, 35142, 41352.

17 of 22

slide-33
SLIDE 33

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE SAME PROPORTION...

But there are a lot more matchings than permutations. Simple permutations of length 5: 24153, 25314, 31524, 35142, 41352. Strongly indecomposable matchings on 5 edges:

17 of 22

slide-34
SLIDE 34

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

BUILDING/ENUMERATING A FAMILY

  • 1. Identify the strongly indecomposable members of the

family.

  • 2. Inflate their edges to form the weakly indecomposable

members.

  • 3. Perform insertions to form all members of the family.

18 of 22

slide-35
SLIDE 35

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

BUILDING/ENUMERATING A FAMILY

  • 1. Identify the strongly indecomposable members of the

family.

  • 2. Inflate their edges to form the weakly indecomposable

members.

  • 3. Perform insertions to form all members of the family.

Jefferson and V. If a downset of matchings has only finitely many strongly indecomposable members then it...

◮ is defined by finitely many minimal forbidden matchings (finite

basis),

◮ does not contain an infinite antichain, and ◮ has an algebraic generating function.

(So the Albert–Atkinson Theorem holds in MP as well as PP.)

18 of 22

slide-36
SLIDE 36

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE D&P FAMILY

Matchings which can be reduced to empty by consecutively deleting edges according to three rules. The edge (i, j) can be deleted if it is

◮ a directly adjacent edge, i.e., j = i + 1, ◮ a directly nested edge, i.e., there exists an edge (i′, j′) such

that i′ = i + 1 and j′ = j − 1, or

◮ a “hairpin”, i.e., there exists an edge (i′, j′) where either

j′ = j + 1 = i′ + 2 = i + 3 or j = j′ + 1 = i + 2 = i′ + 3.

19 of 22

slide-37
SLIDE 37

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE D&P FAMILY

Matchings which can be reduced to empty by consecutively deleting edges according to three rules. The edge (i, j) can be deleted if it is

◮ a directly adjacent edge, i.e., j = i + 1, ◮ a directly nested edge, i.e., there exists an edge (i′, j′) such

that i′ = i + 1 and j′ = j − 1, or

◮ a “hairpin”, i.e., there exists an edge (i′, j′) where either

j′ = j + 1 = i′ + 2 = i + 3 or j = j′ + 1 = i + 2 = i′ + 3. Strongly indecomposable matchings: and . Allowed edge inflations: Only , , , . . . . Allowed insertions: All.

19 of 22

slide-38
SLIDE 38

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE D&P FAMILY

Strongly indecomposable matchings: and “hairpin”. Allowed edge inflations: Only , , , . . . “ladders”. Allowed insertions: All. Generating function f satisfies x3f 6 − x2f 5 + 2xf 3 − xf 2 − f + 1 = 0. Sequence: 1, 3, 13, 65, 351, 1994, 11747, 71117, 439765, 2765775, . . . Given the description above, this is all completely routine.

20 of 22

slide-39
SLIDE 39

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

HAIRPIN-ONLY FAMILIES

All inflations & insertions allowed D&P R&G C&C L&P

21 of 22

slide-40
SLIDE 40

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE L&P FAMILY

Strongly indecomposable matchings: and “hairpin”. Allowed edge inflations: Only , , , . . . “ladders”. Allowed insertions: Only insert non-crossing matchings. Generating function f satisfies (16x3 − 8x2 + x)f 2 + (−28x2 + 15x − 2)f + (25x2 − 14x + 2) = 0. Sequence: 1, 3, 12, 51, 218, 926, 3902, 16323, 67866, 280746, . . . The same sequence arose in work of Klazar (2006), and the enu- merative coincidence has been explained bijectively by Martinez and Riehl (2017).

22 of 22

slide-41
SLIDE 41

THE “BIOLOGY” PP & MP THE SUBSTITUTION DECOMPOSITION A BIT MORE “BIOLOGY”

THE L&P FAMILY

Strongly indecomposable matchings: and “hairpin”. Allowed edge inflations: Only , , , . . . “ladders”. Allowed insertions: Only insert non-crossing matchings. Generating function f satisfies (16x3 − 8x2 + x)f 2 + (−28x2 + 15x − 2)f + (25x2 − 14x + 2) = 0. Sequence: 1, 3, 12, 51, 218, 926, 3902, 16323, 67866, 280746, . . . The same sequence arose in work of Klazar (2006), and the enu- merative coincidence has been explained bijectively by Martinez and Riehl (2017). Thank you.

22 of 22