tighter bounds for the sum of irreducible lcp values
play

Tighter Bounds for the Sum of Irreducible LCP Values Juha Krkkinen 1 - PowerPoint PPT Presentation

Tighter Bounds for the Sum of Irreducible LCP Values Juha Krkkinen 1 Dominik Kempa 1 Marcin Pitkowski 2 1 University of Helsinki 2 Nicolaus Copernicus University Outline 1 Cyclic words 2 Irreducible LCP values 3 Upper bound for the sum of


  1. Tighter Bounds for the Sum of Irreducible LCP Values Juha Kärkkäinen 1 Dominik Kempa 1 Marcin Piątkowski 2 1 University of Helsinki 2 Nicolaus Copernicus University

  2. Outline 1 Cyclic words 2 Irreducible LCP values 3 Upper bound for the sum of irreducible values 4 Lower bound for the sum of irreducible values J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  3. Cyclic suffixes W = { { v 1 , v 2 , v 3 } } = { { aab , aab , ab } } suf ( W ) � 1 , 0 � a a b a a b · · · � 1 , 1 � a b a a b a · · · a 0 a 1 � 1 , 2 � b a a b a a · · · � 2 , 0 � a a b a a b · · · a n a 2 � 2 , 1 � a b a a b a · · · � 2 , 2 � b a a b a a · · · . . . a 3 � 3 , 0 � a b a b a b · · · � 3 , 1 � b a b a b a · · · J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  4. Cyclic suffix array W = { { v 1 , v 2 , v 3 } } = { { aab , aab , ab } } SA W suf ( W ) � 1 , 0 � a a b a a b · · · � 2 , 0 � a a b a a b · · · a 0 a 1 � 1 , 1 � a b a a b a · · · � 2 , 1 � a b a a b a · · · a n a 2 � 3 , 0 � a b a b a b · · · � 1 , 2 � b a a b a a · · · . . . a 3 � 2 , 2 � b a a b a a · · · � 3 , 1 � b a b a b a · · · J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  5. Cyclic suffix array W = { { v 1 , v 2 , v 3 } } = { { aab , aab , ab } } SA W suf ( W ) � 1 , 0 � a a b a a b · · · � 2 , 0 � a a b a a b · · · a 0 a 1 � 1 , 1 � a b a a b a · · · � 2 , 1 � a b a a b a · · · a n a 2 � 3 , 0 � a b a b a b · · · � 1 , 2 � b a a b a a · · · . . . a 3 � 2 , 2 � b a a b a a · · · � 3 , 1 � b a b a b a · · · J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  6. Longest common prefix and distinguishing prefix arrays W = { { v 1 , v 2 , v 3 } } = { { aab , aab , ab } } SA W suf ( W ) LCP W DP W � 1 , 0 � a a b a a b · · · − − � 2 , 0 � a a b a a b · · · ∞ ∞ � 1 , 1 � a b a a b a · · · 1 2 � 2 , 1 � a b a a b a · · · ∞ ∞ � 3 , 0 � a b a b a b · · · 3 4 � 1 , 2 � b a a b a a · · · 0 1 � 2 , 2 � b a a b a a · · · ∞ ∞ � 3 , 1 � b a b a b a · · · 2 3 J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  7. Burrows-Wheeler transform W = { { v 1 , v 2 , v 3 } } = { { aab , aab , ab } } SA W suf ( W ) BWT ( W ) LCP W DP W � 1 , 0 � a a b a a b · · · − − b � 2 , 0 � a a b a a b · · · ∞ ∞ b � 1 , 1 � a b a a b a · · · 1 2 a � 2 , 1 � a b a a b a · · · ∞ ∞ a � 3 , 0 � a b a b a b · · · 3 4 b � 1 , 2 � b a a b a a · · · 0 1 a � 2 , 2 � b a a b a a · · · ∞ ∞ a � 3 , 1 � b a b a b a · · · 2 3 a J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  8. Cyclic equivalence Cyclic equivalence Two multisets of words V and W are cyclically equivalent if suf ( V ) = suf ( W ) . Example multiset Equivalence class W = { { aab , aab , ab } } { { aab , aab , ab } } { { aab , aab , ba } } { { aba , aab , ab } } { { aba , aab , ba } } aabaabaabaab · · · { { baa , aab , ab } } { { baa , aab , ba } } abaabaabaaba · · · { { aab , aba , ab } } { { aab , aba , ba } } baabaabaabaa · · · . . aabaabaabaab · · · . . . . abaabaabaaba · · · { { aabaab , ab } } { { aabaab , ba } } baabaabaabaa · · · { { abaaba , ab } } { { abaaba , ba } } abababababab · · · { { baabaa , ab } } { { baabaa , ba } } babababababa · · · J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  9. Cyclic equivalence Lemma } s Let W = { { w i } i = 1 be a multiset of cyclic words. Then: There exists a set of cyclic words V = { v i } t i = 1 such that 1 suf ( W ) = suf ( V ) . } p There exists a multiset of primitive cyclic words U = { { u i } i = 1 such 2 that suf ( W ) = suf ( V ) . J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  10. Cyclic equivalence Lemma } s Let W = { { w i } i = 1 be a multiset of cyclic words. Then: There exists a set of cyclic words V = { v i } t i = 1 such that 1 suf ( W ) = suf ( V ) . } p There exists a multiset of primitive cyclic words U = { { u i } i = 1 such 2 that suf ( W ) = suf ( V ) . Remark If two multisets of words V and W are cyclically equivalent, then LCP V = LCP W , DP V = DP W and BWT V = BWT W . J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  11. Cyclic equivalence Lemma } s Let W = { { w i } i = 1 be a multiset of cyclic words. Then: There exists a set of cyclic words V = { v i } t i = 1 such that 1 suf ( W ) = suf ( V ) . } p There exists a multiset of primitive cyclic words U = { { u i } i = 1 such 2 that suf ( W ) = suf ( V ) . Remark If two multisets of words V and W are cyclically equivalent, then LCP V = LCP W , DP V = DP W and BWT V = BWT W . Theorem (Mantaci, Restivo, Rosone, Sciortino – 2007) The mapping from a word v to the cyclical equivalence class of IBWT ( v ) is a bijection J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  12. Irreducible LCP and DP values Irreducible values A value LCP W [ i ] (respectively DP W [ i ] ) is irreducible if BWT W [ i ] � = BWT W [ i − 1 ] . SA W suf ( W ) BWT ( W ) LCP W DP W � 1 , 0 � a a b a a b · · · − − b � 2 , 0 � a a b a a b · · · ∞ ∞ b � 1 , 1 � a b a a b a · · · 1 2 a � 2 , 1 � a b a a b a · · · ∞ ∞ a � 3 , 0 � a b a b a b · · · 3 4 b � 1 , 2 � b a a b a a · · · 0 1 a � 2 , 2 � b a a b a a · · · ∞ ∞ a � 3 , 1 � b a b a b a · · · 2 3 a J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  13. Sum of irreducible LCP values Irreducible values n � } s W = { { w i } and | w i | = n i = 1 i = 1 Σ ilcp ( W ) – sum of irreducible LCP values Σ idp ( W ) – sum of distinguishing prefixes lengths Theorem (Kärkkäinen, Manzini, Puglisi – 2009) Σ ilcp ( W ) = O ( n log n ) J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  14. New upper bounds for the sum of irreducible lcp values Theorem For any multiset W of words of total length n > 0, we have Σ ilcp ( W ) � Σ idp ( W ) � n ⌈ lg n ⌉ − 2 ⌈ lg n ⌉ + 1 Theorem For any multiset W of words of total length n > 0 such that BWT ( W ) has r runs, we have Σ ilcp ( W ) + r − 1 = Σ idp ( W ) � n ⌈ lg r ⌉ − 2 ⌈ lg r ⌉ + 1 J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  15. Reverse suffixes STree ( W ) W = { { a , aab , abb , b } } • aaaaaa · · · (1) a b baabaa · · · (2) abaaba · · · (3) a a b b bbabba · · · (4) aabaab · · · a a a a (5) b b b b babbab · · · (6) abbabb · · · ( 1 ) ( 5 ) ( 3 ) ( 7 ) ( 2 ) ( 6 ) ( 4 ) ( 8 ) (7) bbbbbb · · · (8) J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  16. Reverse suffixes STree ( W ) W = { { a , aab , abb , b } } • aaaaaa · · · (1) a b baabaa · · · (2) abaaba · · · (3) a a b b bbabba · · · (4) aabaab · · · a a a a (5) b b b b babbab · · · (6) abbabb · · · ( 1 ) ( 5 ) ( 3 ) ( 7 ) ( 2 ) ( 6 ) ( 4 ) ( 8 ) (7) bbbbbb · · · (8) Dispersal pair with respect to � The pair of leaves ( u , v ) of an ordered tree is called a dispersal pair if u < v the subtree rooted at their nearest common ancestor contains no leaf w such that u < w < v . J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  17. Reverse suffixes STree ( W ) W = { { a , aab , abb , b } } • aaaaaa · · · (1) a b baabaa · · · (2) abaaba · · · (3) a a b b bbabba · · · (4) aabaab · · · a a a a (5) b b b b babbab · · · (6) abbabb · · · ( 1 ) ( 5 ) ( 3 ) ( 7 ) ( 2 ) ( 6 ) ( 4 ) ( 8 ) (7) bbbbbb · · · (8) Lemma � � D STree ( W ) , � W = Σ idp ( W ) D ( T , � ) – the number of dispersal pairs in T J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  18. n log n upper bound d ( 1 ) = 0 d ( n ) = i ∈ [ 1 .. ⌊ n / 2 ⌋ ] d ( n , i ) max when n > 1 d ( n , k ) = d ( k ) + d ( n − k ) + min { 2 k , n − 1 } where n > 0 and k ∈ [ 0 .. ⌊ n / 2 ⌋ ] . Lemma d ( n ) = max { D ( T , � ) } , where the maximum is taken over any rooted tree T with n leaves and any total order � on its leaves. J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  19. n log n upper bound Lemma d ( n ) = n ⌈ lg n ⌉ − 2 ⌈ lg n ⌉ + 1 Theorem For any multiset W of words of total length n > 0, we have Σ ilcp ( W ) � Σ idp ( W ) � d ( n ) � n ⌈ lg n ⌉ − 2 ⌈ lg n ⌉ + 1 J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

  20. n log r upper bound Lemma � �� � � � If BWT ( W ) has r runs, then � D u STree ( W ) , � W � < r for every vertex u in STree ( W ) . J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend