insertions yielding equivalent double occurrence words
play

Insertions Yielding Equivalent Double Occurrence Words Daniel A. - PowerPoint PPT Presentation

Insertions Yielding Equivalent Double Occurrence Words Daniel A. Cruz, Margherita Maria Ferrari, Nata sa Jonoska, Lukas Nabergall, and Masahico Saito University of South Florida mmferrari@usf.edu 19 May, 2019 1 / 23 Motivation: Analysis


  1. Insertions Yielding Equivalent Double Occurrence Words Daniel A. Cruz, Margherita Maria Ferrari, Nataˇ sa Jonoska, Lukas Nabergall, and Masahico Saito University of South Florida mmferrari@usf.edu 19 May, 2019 1 / 23

  2. Motivation: Analysis of DNA Scrambling in Ciliates M 5 M 1 M 1 M 2 M 3 M 5 M 4 M 3 1 12 23 4 34 ⇒ w = 11223434 1 4 2 M 2 3 M 4 Prescott, D. M. Genome Gymnastics: Unique Modes of DNA Evolution and Processing in Ciliates . Nature Reviews Genetics 1 (3) (2000) pp. 191-198. 2 / 23

  3. Preliminaries Given an alphabet Σ, e.g. N , ◮ w = 15164443 is a word over Σ ◮ The length of w is 8, written | w | = 8 ◮ The set of symbols used in w is Σ[ w ] = { 1 , 3 , 4 , 5 , 6 } ◮ w R = 34446151 is the reverse of w The set of all words over Σ is denoted Σ ∗ and includes the empty word ǫ . 3 / 23

  4. Preliminaries Given an alphabet Σ, e.g. N , ◮ w = 15164443 is a word over Σ ◮ The length of w is 8, written | w | = 8 ◮ The set of symbols used in w is Σ[ w ] = { 1 , 3 , 4 , 5 , 6 } ◮ w R = 34446151 is the reverse of w The set of all words over Σ is denoted Σ ∗ and includes the empty word ǫ . The word w is a double occurrence word (DOW) if each symbol in Σ appears 0 or 2 times in w . The set of all DOWs is Σ DOW . 11 , 1221 , 11223434 ∈ Σ DOW The size of the DOW w is | w | / 2 Single occurrence words (SOWs) are similarly defined. 4 / 23

  5. Definition: Equivalence Words v , w ∈ Σ ∗ are equivalent if there exists a bijection f : Σ → Σ such that f ( v ) = w ; in this case, we write v ∼ w . 123123 1 2 3 1234562345617887 ↓ ↓ ↓ ↓ ↓ 321321 3 2 1 1232314567887654 Equivalent Words Non Equivalent Words 5 / 23

  6. Definition: Equivalence Words v , w ∈ Σ ∗ are equivalent if there exists a bijection f : Σ → Σ such that f ( v ) = w ; in this case, we write v ∼ w . 123123 1 2 3 1234562345617887 ↓ ↓ ↓ ↓ ↓ 321321 3 2 1 1232314567887654 Equivalent Words Non Equivalent Words A word w = a 1 · · · a n is in ascending order if: ◮ a 1 = 1 ◮ when i appears for the first time, it is preceded by 1 , 2 , . . . , i − 1 For example: 123123 is ascending order while 131232 is not 6 / 23

  7. Definition: Repeat and Return Words Given w ∈ Σ ∗ and SOW u ∈ Σ + = Σ ∗ \ { ǫ } , ◮ the word uu is a repeat word in w if w = z 1 uz 2 uz 3 for some z 1 , z 2 , z 3 ∈ Σ ∗ ◮ the word uu R is a return word in w if w = z 1 uz 2 u R z 3 for some z 1 , z 2 , z 3 ∈ Σ ∗ w Repeat words 1123455234678876 234234, 2323, 88, etc. w Return words 1123455234678876 678876, 6776, 22, etc. A repeat word uu or return word uu R is trivial if | u | = 1. 7 / 23

  8. Repeat and Return Words in Ciliate DNA M 6 M 7 M 8 M 9 M 11 M 1 M 3 M 10 M 2 M 4 M 5 M 12 M 13 56 67 78 89 ab 1 23 9 a 12 34 45 bc c w 0 = 56677889 ab 1239 a 123445 bcc Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax . Journal of Theoretical Biology 410 (2016) pp. 171-180. 8 / 23

  9. Repeat and Return Words in Ciliate DNA M 6 M 7 M 8 M 9 M 11 M 1 M 3 M 10 M 2 M 4 M 5 M 12 M 13 56 67 78 89 ab 1 23 9 a 12 34 45 bc c w 0 = 56677889 ab 1239 a 123445 bcc w 1 = 59 ab 1239 a 1235 b Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax . Journal of Theoretical Biology 410 (2016) pp. 171-180. 9 / 23

  10. Repeat and Return Words in Ciliate DNA M 6 M 7 M 8 M 9 M 11 M 1 M 3 M 10 M 2 M 4 M 5 M 12 M 13 56 67 78 89 ab 1 23 9 a 12 34 45 bc c w 0 = 56677889 ab 1239 a 123445 bcc w 1 = 59 ab 1239 a 1235 b w 2 = 5 b 5 b Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax . Journal of Theoretical Biology 410 (2016) pp. 171-180. 10 / 23

  11. Repeat and Return Words in Ciliate DNA M 6 M 7 M 8 M 9 M 11 M 1 M 3 M 10 M 2 M 4 M 5 M 12 M 13 56 67 78 89 ab 1 23 9 a 12 34 45 bc c w 0 = 56677889 ab 1239 a 123445 bcc w 1 = 59 ab 1239 a 1235 b w 2 = 5 b 5 b w 3 = ǫ Nested appearances of repeat and return words explain over 95% of all scrambled MIC genome of Oxytricha trifallax . Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax . Journal of Theoretical Biology 410 (2016) pp. 171-180. 11 / 23

  12. Definition: Repeat and Return Insertions Given w = a 1 · · · a n ∈ Σ DOW in ascending order, ◮ let 1 ≤ k ≤ ℓ ≤ n + 1, ◮ let u be a SOW over Σ \ Σ[ w ] in ascending order, where | u | = ν Then I ( ν, k , ℓ ) is an insertion into w which acts as follows: w ⋆ I ( ν, k , ℓ ) = a 1 · · · a k − 1 ua k · · · a ℓ − 1 u ′ a ℓ · · · a n where � u for repeat insertion ( I = ρ ) u ′ = u R for return insertion ( I = τ ) . ρ (3 , 4 , 6) − → 1232314554 1236782367814554 τ (3 , 7 , 11) 1232314554 − → 1232316784554876 12 / 23

  13. Insertions Yielding Equivalent DOWs The following insertions yield equivalent DOWs: 1221 ⋆ τ (2 , 3 , 3)= 12344321 1 2 3 4 ↓ ↓ ↓ ↓ 1221 ⋆ τ (2 , 1 , 5)= 34122143 3 4 1 2 ∼ 12344321 If w 1 = w ⋆ I 1 ( ν 1 , k 1 , ℓ 1 ) ∼ w ⋆ I 2 ( ν 2 , k 2 , ℓ 2 ) = w 2 , what can we say about w if I 1 and I 2 are “distinct” (i.e. ( k 1 , ℓ 1 ) � = ( k 2 , ℓ 2 ))? If w 1 ∼ w 2 , then ν 1 = ν 2 = ν 13 / 23

  14. Insertions Yielding Equivalent DOWs Without loss of generality, we take k 1 ≤ k 2 . Suppose that k 1 = k 2 : u ′ 1 ∈ { u , u R } u w 1 w 2 u ′ 2 ∈ { u , u R } u Thus, k 1 � = k 2 ; similarly ℓ 1 � = ℓ 2 . We have three cases: u ′ u ′ u ′ u u u 1 1 1 u ′ u ′ u ′ u u u 2 2 2 Interleaving Nested Sequential ( k 1 ≤ ℓ 1 < k 2 ≤ ℓ 2 ) ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) ( k 1 < k 2 ≤ ℓ 2 < ℓ 1 ) 14 / 23

  15. Interleaving Insertions ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) We consider two repeat insertions to start. u z 1 z 2 u z 3 w 1 w 2 z 1 u z 2 z 3 u Note that uz 1 ∼ z 1 u . 15 / 23

  16. Interleaving Insertions ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) u f ( u ) z 2 u z 3 w 1 f w 2 f ( u ) u z 2 z 3 u 16 / 23

  17. Interleaving Insertions ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) u f ( u ) z 2 u z 3 w 1 f w 2 f ( u ) u z 2 z 3 u f 2 ( u ) u f ( u ) z 2 u z 3 w 1 f f w 2 f 2 ( u ) f ( u ) u z 2 z 3 u 16 / 23

  18. Interleaving Insertions ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) f h ( u ) f h ( u ) u f ( u ) · · · z 2 u f ( u ) · · · w 1 f f f f f w 2 f ( u ) f 2 ( u ) · · · u z 2 f ( u ) · · · u We adapt a result by Lyndon and Sch¨ utzenberger: Lemma If xz = zy and x � = ǫ , then x = st, z = ( st ) h s, and y = ts for some s , t ∈ Σ ∗ and h ≥ 0 . utzenberger, M.-P. “The equation a M = b N c P in a free Lyndon, R.C., and Sch¨ group.” The Michigan Mathematical Journal 9 :4 (1962) pp. 289-298. 17 / 23

  19. Interleaving Insertions ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) Proposition (Interleaving) ◮ For repeat insertions, z 1 z 3 is a repeat word. ◮ For return insertions, z 1 z 3 ∼ Int( k 2 − k 1 , ν ). ν Int( h , q ) = x 1 x 2 · · · x h x R 1 x R 2 · · · x R h where each x i x R is a return word i and | x i | = q for 1 ≤ i ≤ h . Int( h , q ) can be obtained recursively: x 1 x R 1 x 1 x 2 x R 1 x R 2 . . . x 1 x 2 · · · x h x R 1 x R 2 · · · x R h For example, Int(2 , 2) = 12342143 where x 1 = 12 and x 2 = 34. 18 / 23

  20. Nested Insertions ( k 1 < k 2 ≤ ℓ 2 < ℓ 1 ) Proposition (Nested) ◮ For repeat insertions, z 1 z 3 ∼ Nes( k 2 − k 1 , ν ). ν ◮ For return insertions, z 1 z 3 is a return word. Nes( h , q ) = x 1 x 2 · · · x h − 1 x h x h x h − 1 · · · x 2 x 1 where each x i x i is a repeat word and | x i | = q for 1 ≤ i ≤ h . Nes( h , q ) can be obtained recursively: x 1 x 1 x 1 x 2 x 2 x 1 . . . x 1 x 2 · · · x h − 1 x h x h x h − 1 · · · x 2 x 1 For example, Nes(2 , 2) = 12343412 where x 1 = 12 and x 2 = 34. 19 / 23

  21. Sequential Insertions ( k 1 ≤ ℓ 1 < k 2 ≤ ℓ 2 ) Consider the following words: v 0 = 123123 | v 0 | = 2 · 3 = 6 v 1 = 1234512345 = v 0 ⋆ ρ (2 , | v 0 | − 2 , | v 0 | + 1) v 2 = 12345126734567 = v 1 ⋆ ρ (2 , | v 1 | − 2 , | v 1 | + 1) v 3 = 123451267348956789 = v 2 ⋆ ρ (2 , | v 2 | − 2 , | v 2 | + 1) Word v j is a ρ -tangled cord at level j , denoted T ρ ( ν, m , j ) with m = 3 and ν = 2. τ -tangled cord al level j T τ ( ν, m , j ) is defined similarly. Tangled cords , T ρ (1 , 1 , i ), were introduced in: Burns, J. et al. Four-regular graphs with rigid vertices associated to DNA recombination . Discrete Applied Mathematics, 161 :10-11 (2013) pp. 1378-1394. 20 / 23

  22. Sequential Insertions ( k 1 ≤ ℓ 1 < k 2 ≤ ℓ 2 ) Proposition (Sequential) � ν, ℓ 1 − k 1 , k 2 − ℓ 1 � ◮ For repeat insertions, z 1 z 2 z 3 ∼ T ρ . 2 ν � ν, ℓ 1 − k 1 , k 2 − ℓ 1 � ◮ For return insertions, z 1 z 2 z 3 ∼ T τ . 2 ν Proposition Every T I ( ν, m , j ) is a palindrome where I ∈ { ρ, τ } . 21 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend