Insertions Yielding Equivalent Double Occurrence Words Daniel A. - - PowerPoint PPT Presentation

insertions yielding equivalent double occurrence words
SMART_READER_LITE
LIVE PREVIEW

Insertions Yielding Equivalent Double Occurrence Words Daniel A. - - PowerPoint PPT Presentation

Insertions Yielding Equivalent Double Occurrence Words Daniel A. Cruz, Margherita Maria Ferrari, Nata sa Jonoska, Lukas Nabergall, and Masahico Saito University of South Florida mmferrari@usf.edu 19 May, 2019 1 / 23 Motivation: Analysis


slide-1
SLIDE 1

Insertions Yielding Equivalent Double Occurrence Words

Daniel A. Cruz, Margherita Maria Ferrari, Nataˇ sa Jonoska, Lukas Nabergall, and Masahico Saito

University of South Florida mmferrari@usf.edu 19 May, 2019

1 / 23

slide-2
SLIDE 2

Motivation: Analysis of DNA Scrambling in Ciliates

M1 M2 M3 M5 M4 1 12 23 4 34 ⇒ w = 11223434

1 2 3 4 M2 M3 M4 M5 M 1

Prescott, D. M. Genome Gymnastics: Unique Modes of DNA Evolution and Processing in Ciliates. Nature Reviews Genetics 1(3) (2000) pp. 191-198.

2 / 23

slide-3
SLIDE 3

Preliminaries

Given an alphabet Σ, e.g. N,

◮ w = 15164443 is a word over Σ ◮ The length of w is 8, written |w| = 8 ◮ The set of symbols used in w is Σ[w] = {1, 3, 4, 5, 6} ◮ wR = 34446151 is the reverse of w

The set of all words over Σ is denoted Σ∗ and includes the empty word ǫ.

3 / 23

slide-4
SLIDE 4

Preliminaries

Given an alphabet Σ, e.g. N,

◮ w = 15164443 is a word over Σ ◮ The length of w is 8, written |w| = 8 ◮ The set of symbols used in w is Σ[w] = {1, 3, 4, 5, 6} ◮ wR = 34446151 is the reverse of w

The set of all words over Σ is denoted Σ∗ and includes the empty word ǫ. The word w is a double occurrence word (DOW) if each symbol in Σ appears 0 or 2 times in w. The set of all DOWs is ΣDOW . 11, 1221, 11223434 ∈ ΣDOW The size of the DOW w is |w|/2 Single occurrence words (SOWs) are similarly defined.

4 / 23

slide-5
SLIDE 5

Definition: Equivalence

Words v, w ∈ Σ∗ are equivalent if there exists a bijection f : Σ → Σ such that f (v) = w; in this case, we write v ∼ w. 123123 1 2 3 ↓ ↓ ↓ 321321 3 2 1 Equivalent Words 1234562345617887 ↓ ↓ 1232314567887654 Non Equivalent Words

5 / 23

slide-6
SLIDE 6

Definition: Equivalence

Words v, w ∈ Σ∗ are equivalent if there exists a bijection f : Σ → Σ such that f (v) = w; in this case, we write v ∼ w. 123123 1 2 3 ↓ ↓ ↓ 321321 3 2 1 Equivalent Words 1234562345617887 ↓ ↓ 1232314567887654 Non Equivalent Words A word w = a1 · · · an is in ascending order if:

◮ a1 = 1 ◮ when i appears for the first time, it is preceded by

1, 2, . . . , i − 1 For example: 123123 is ascending order while 131232 is not

6 / 23

slide-7
SLIDE 7

Definition: Repeat and Return Words

Given w ∈ Σ∗ and SOW u ∈ Σ+ = Σ∗ \ {ǫ},

◮ the word uu is a repeat word in w if w = z1uz2uz3 for some

z1, z2, z3 ∈ Σ∗

◮ the word uuR is a return word in w if w = z1uz2uRz3 for

some z1, z2, z3 ∈ Σ∗ w Repeat words 1123455234678876 234234, 2323, 88, etc. w Return words 1123455234678876 678876, 6776, 22, etc. A repeat word uu or return word uuR is trivial if |u| = 1.

7 / 23

slide-8
SLIDE 8

Repeat and Return Words in Ciliate DNA

M6 M7 M8 M9 M11 M1 M3 M10 M2 M4 M5 M12 M13 56 67 78 89 ab 1 23 9a 12 34 45 bc c w0 = 56677889ab1239a123445bcc

Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax. Journal of Theoretical Biology 410 (2016) pp. 171-180.

8 / 23

slide-9
SLIDE 9

Repeat and Return Words in Ciliate DNA

M6 M7 M8 M9 M11 M1 M3 M10 M2 M4 M5 M12 M13 56 67 78 89 ab 1 23 9a 12 34 45 bc c w0 = 56677889ab1239a123445bcc w1 = 59ab1239a1235b

Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax. Journal of Theoretical Biology 410 (2016) pp. 171-180.

9 / 23

slide-10
SLIDE 10

Repeat and Return Words in Ciliate DNA

M6 M7 M8 M9 M11 M1 M3 M10 M2 M4 M5 M12 M13 56 67 78 89 ab 1 23 9a 12 34 45 bc c w0 = 56677889ab1239a123445bcc w1 = 59ab1239a1235b w2 = 5b5b

Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax. Journal of Theoretical Biology 410 (2016) pp. 171-180.

10 / 23

slide-11
SLIDE 11

Repeat and Return Words in Ciliate DNA

M6 M7 M8 M9 M11 M1 M3 M10 M2 M4 M5 M12 M13 56 67 78 89 ab 1 23 9a 12 34 45 bc c w0 = 56677889ab1239a123445bcc w1 = 59ab1239a1235b w2 = 5b5b w3 = ǫ Nested appearances of repeat and return words explain over 95%

  • f all scrambled MIC genome of Oxytricha trifallax.

Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax. Journal of Theoretical Biology 410 (2016) pp. 171-180.

11 / 23

slide-12
SLIDE 12

Definition: Repeat and Return Insertions

Given w = a1 · · · an ∈ ΣDOW in ascending order,

◮ let 1 ≤ k ≤ ℓ ≤ n + 1, ◮ let u be a SOW over Σ \ Σ[w] in ascending order, where

|u| = ν Then I(ν, k, ℓ) is an insertion into w which acts as follows: w ⋆ I(ν, k, ℓ) = a1 · · · ak−1uak · · · aℓ−1u′aℓ · · · an where u′ =

  • u

for repeat insertion (I = ρ) uR for return insertion (I = τ). 1232314554

ρ(3,4,6)

− → 1236782367814554 1232314554

τ(3,7,11)

− → 1232316784554876

12 / 23

slide-13
SLIDE 13

Insertions Yielding Equivalent DOWs

The following insertions yield equivalent DOWs: 1221 ⋆ τ(2, 3, 3)= 12344321 1 2 3 4 ↓ ↓ ↓ ↓ 1221 ⋆ τ(2, 1, 5)= 34122143 3 4 1 2 ∼ 12344321 If w1 = w ⋆ I1(ν1, k1, ℓ1) ∼ w ⋆ I2(ν2, k2, ℓ2) = w2, what can we say about w if I1 and I2 are “distinct” (i.e. (k1, ℓ1) = (k2, ℓ2))? If w1 ∼ w2, then ν1 = ν2 = ν

13 / 23

slide-14
SLIDE 14

Insertions Yielding Equivalent DOWs

Without loss of generality, we take k1 ≤ k2. Suppose that k1 = k2: w1 w2 u u′

1 ∈ {u, uR}

u u′

2 ∈ {u, uR}

Thus, k1 = k2; similarly ℓ1 = ℓ2. We have three cases: u u′

1

u u′

2

Interleaving

(k1 < k2 ≤ ℓ1 < ℓ2)

u u′

1

u u′

2

Nested

(k1 < k2 ≤ ℓ2 < ℓ1)

u u′

1

u u′

2

Sequential

(k1 ≤ ℓ1 < k2 ≤ ℓ2)

14 / 23

slide-15
SLIDE 15

Interleaving Insertions (k1 < k2 ≤ ℓ1 < ℓ2)

We consider two repeat insertions to start. w1 w2 u z1 z2 u z3 z1 u z2 z3 u Note that uz1 ∼ z1u.

15 / 23

slide-16
SLIDE 16

Interleaving Insertions (k1 < k2 ≤ ℓ1 < ℓ2)

w1 w2 u f (u) z2 u z3 f (u) u z2 z3 u f

16 / 23

slide-17
SLIDE 17

Interleaving Insertions (k1 < k2 ≤ ℓ1 < ℓ2)

w1 w2 u f (u) z2 u z3 f (u) u z2 z3 u f w1 w2 u f (u) z2 u z3 f (u) f 2(u) f 2(u) u z2 z3 u f f

16 / 23

slide-18
SLIDE 18

Interleaving Insertions (k1 < k2 ≤ ℓ1 < ℓ2)

w1 w2 u f (u) · · · f h(u) z2 u f (u) · · · f h(u) f (u) f 2(u) · · · u z2 f (u) · · · u f f f f f We adapt a result by Lyndon and Sch¨ utzenberger:

Lemma

If xz = zy and x = ǫ, then x = st, z = (st)hs, and y = ts for some s, t ∈ Σ∗ and h ≥ 0.

Lyndon, R.C., and Sch¨ utzenberger, M.-P. “The equation aM = bNcP in a free group.” The Michigan Mathematical Journal 9:4 (1962) pp. 289-298.

17 / 23

slide-19
SLIDE 19

Interleaving Insertions (k1 < k2 ≤ ℓ1 < ℓ2)

Proposition (Interleaving)

◮ For repeat insertions, z1z3 is a repeat word. ◮ For return insertions, z1z3 ∼ Int( k2−k1 ν

, ν). Int(h, q) = x1x2 · · · xhxR

1 xR 2 · · · xR h where each xixR i

is a return word and |xi| = q for 1 ≤ i ≤ h. Int(h, q) can be obtained recursively: x1xR

1

x1x2xR

1 xR 2

. . . x1x2 · · · xhxR

1 xR 2 · · · xR h

For example, Int(2, 2) = 12342143 where x1 = 12 and x2 = 34.

18 / 23

slide-20
SLIDE 20

Nested Insertions (k1 < k2 ≤ ℓ2 < ℓ1)

Proposition (Nested)

◮ For repeat insertions, z1z3 ∼ Nes( k2−k1 ν

, ν).

◮ For return insertions, z1z3 is a return word.

Nes(h, q) = x1x2 · · · xh−1xhxhxh−1 · · · x2x1 where each xixi is a repeat word and |xi| = q for 1 ≤ i ≤ h. Nes(h, q) can be obtained recursively: x1x1 x1x2x2x1 . . . x1x2 · · · xh−1xhxhxh−1 · · · x2x1 For example, Nes(2, 2) = 12343412 where x1 = 12 and x2 = 34.

19 / 23

slide-21
SLIDE 21

Sequential Insertions (k1 ≤ ℓ1 < k2 ≤ ℓ2)

Consider the following words: v0 = 123123 |v0| = 2 · 3 = 6 v1 = 1234512345 = v0 ⋆ ρ(2, |v0| − 2, |v0| + 1) v2 = 12345126734567 = v1 ⋆ ρ(2, |v1| − 2, |v1| + 1) v3 = 123451267348956789 = v2 ⋆ ρ(2, |v2| − 2, |v2| + 1) Word vj is a ρ-tangled cord at level j, denoted Tρ(ν, m, j) with m = 3 and ν = 2. τ-tangled cord al level j Tτ(ν, m, j) is defined similarly.

Tangled cords, Tρ(1, 1, i), were introduced in: Burns, J. et al. Four-regular graphs with rigid vertices associated to DNA recombination. Discrete Applied Mathematics, 161:10-11 (2013) pp. 1378-1394.

20 / 23

slide-22
SLIDE 22

Sequential Insertions (k1 ≤ ℓ1 < k2 ≤ ℓ2)

Proposition (Sequential)

◮ For repeat insertions, z1z2z3 ∼ Tρ

  • ν, ℓ1 − k1, k2 − ℓ1

  • .

◮ For return insertions, z1z2z3 ∼ Tτ

  • ν, ℓ1 − k1, k2 − ℓ1

  • .

Proposition

Every TI(ν, m, j) is a palindrome where I ∈ {ρ, τ}.

21 / 23

slide-23
SLIDE 23

Repeat and Return Insertions

Interleaving insertions: w1 w2 u f h(u) u f h(u) u uR f f f f . . . . . .

Lemma∗

If uu and vvR are repeat and return words in w ∈ Σ∗ such that Σ[u] ∩ Σ[v] = ∅, then |u| = 1 or |v| = 1.

Proposition

Let I1 and I2 be distinct nontrivial insertions. If w1 ∼ w2, then I1 and I2 are of the same type.

∗: Jonoska, N. et al. Patterns and Distances in Words Related to DNA Rearrangement. Fundamenta Informaticae

154:1-4 (2017) pp 225-238. 22 / 23

slide-24
SLIDE 24

Thank You for Listening!

Work supported by the NSF grants CCF-1526485 and DMS-1800443, the NIH grant R01GM109459-01, and the Southeast Center for Mathematics and Biology (an NSF-Simons Research Center for Mathematics of Complex Biological Systems) under NSF grant DMS-1764406 and Simons Foundation grant 594594

23 / 23