in place bijective bwt transforms
play

In-Place (Bijective) BWT Transforms Dominik Kppl Kyushu - PowerPoint PPT Presentation

In-Place (Bijective) BWT Transforms Dominik Kppl Kyushu University Daiki Hashimoto Tohoku University Diptarama Ayumi Shinohara data structures Burrows-Wheeler Transform (BWT) [Burrows,Wheeler '94] Bijective BWT (BBWT) [Gil,Scott '12]


  1. In-Place (Bijective) BWT Transforms Dominik Köppl Kyushu University Daiki Hashimoto Tohoku University Diptarama Ayumi Shinohara

  2. data structures Burrows-Wheeler Transform (BWT) [Burrows,Wheeler '94] Bijective BWT (BBWT) [Gil,Scott '12] 2

  3. BWT of bacabbabb T = bacabbabb$ 3

  4. BWT of bacabbabb T = bacabbabb$ all suffjxes bacabbabb$ acabbabb$ cabbabb$ abbabb$ bbabb$ babb$ abb$ bb$ b$ $ 4

  5. BWT of bacabbabb T = bacabbabb$ all suffjxes $ bacabbabb$ b acabbabb$ a cabbabb$ c abbabb$ a bbabb$ b babb$ prev. char b abb$ a bb$ b b$ b $ 5

  6. BWT of bacabbabb T = bacabbabb$ all suffjxes $ bacabbabb$ $ bacabbabb$ b acabbabb$ b acabbabb$ a cabbabb$ a cabbabb$ c abbabb$ c abbabb$ a bbabb$ a bbabb$ b babb$ b babb$ align prev. char b abb$ b abb$ left a bb$ a bb$ b b$ b b$ b $ b $ 6

  7. BWT of bacabbabb T = bacabbabb$ all suffjxes BWT $ bacabbabb$ $ bacabbabb$ b $ b acabbabb$ b acabbabb$ b abb$ a cabbabb$ a cabbabb$ c abbabb$ c abbabb$ c abbabb$ b acabbabb$ a bbabb$ a bbabb$ b babb$ b babb$ b babb$ b b$ < lex sort align prev. char b abb$ b abb$ $ bacabbabb$ left a bb$ a bb$ a bb$ b b$ b b$ a bbabb$ b $ b $ a cabbabb$ lex. order 7

  8. the BBWT is the BWT of the Lyndon factorization of an input text with respect to ≺ ω 8

  9. the BBWT is the BWT of the Lyndon factorization 1. of an input text with respect to ≺ ω 2. 9

  10. Lyndon words – a – aabab Lyndon word is smaller than ● any proper suffix ● any rotation 10

  11. Lyndon words – a – aabab Lyndon word is smaller than ● any proper suffix ● any rotation not Lyndon words: – abaab (rotation aabab smaller) – abab ( abab not smaller than suffjx ab ) 11

  12. Lyndon factorization [Chen+ '58] ● input: text T = T 1 T 2 T t ⋯ ● output: factorization T 1 ... T t with – T x is Lyndon word – T x ≥ lex T x +1 – factorization uniquely defjned – linear time [Duval'88] (Chen-Fox-Lyndon Theorem) (Chen-Fox-Lyndon theorem) 12

  13. example T = bacabbabb Lyndon factorization : b|ac|abb|abb – b,ac,abb , and abb are Lyndon – b > lex ac > lex abb ≥ lex abb 13

  14. ≺ ω order ● u ≺ ω w : ⟺ u u u u ... < lex w w w w ... ● ab < lex aba ● aba ≺ ω ab 14

  15. ≺ ω order ● u ≺ ω w : ⟺ u u u u ... < lex w w w w ... ● ab < lex aba abababab⋯ abaabaaba⋯ ● aba ≺ ω ab 15

  16. BBWT of bacabbabb b|ac|abb|abb 16

  17. BBWT of bacabbabb b|ac|abb|abb b ac abb abb ca bab bab bba bba 17

  18. BBWT of bacabbabb b|ac|abb|abb b b ac abb abb ac ca bab bab ca bba bba abb bab bba abb bab bba 18

  19. BBWT of bacabbabb b|ac|abb|abb b abb b ac abb abb ac abb ca bab bab ca ac bba bba abb bab bab bab ≺ ω bba bba abb bba bab b bba ca 19

  20. BBWT of bacabbabb b|ac|abb|abb BBWT b abb abb b b ac abb abb ac abb abb b ca bab bab ca ac ac c bba bba abb bab bab b bab bab bab b ≺ ω bba bba bba a abb bba bba a bab b b b bba ca ca a BBWT( T ) = bbcbbaaba 20

  21. BBWT of bacabbabb b|ac|abb|abb BBWT b abb abb b b ac abb abb ac abb abb b ca bab bab ca ac ac c bba bba abb bab bab b bab bab bab b ≺ ω bba bba bba a abb bba bba a bab b b b bba ca ca a BBWT( T ) = bbcbbaaba BWT( T $ ) = bbcbbb$aaa 21

  22. motivation properties of BBWT : ● no $ necessary ● BBWT is more compressible than BWT for various inputs [Scott and Gill '12] ● BBWT is indexible (full text index) ● is computable in O( n ) time with O( n ) words [Bannai+ '19] however, O( n ) words can be too much for large n 22

  23. in-place computation ● Σ: alphabet, σ := |Σ| alphabet size ● T : text, n := | T | ● L := n lg σ bits workspace ● aim : in-place computation transform T BWT BBWT with ↔ ↔ | L | + O(lg n ) bits of workspace L T := b a c a b b a b b 23

  24. known solutions work- input output time reference space text BWT in-place O( n 2 ) Crochemore+ '15 BWT text in-place O( n 2+ε ) O( n lg σ ) O( n text BBWT Bonomo+ '14 bits lg n /lg lg n ) σ : alphabet size, n : text length, 24 ε is a constant with 0 < ε < 1

  25. in-place conversions text known O( n 2 ) O( n 2+ ε ) O( n 2 ) O( n 2+ ε ) BWT BBWT O( n 2+ ε ) working space: n lg σ + O(lg n ) bits (including text) 25

  26. forward search F L T = bacabbabb$ b $ a b a c b a b b b b b $ b a b a c a 26

  27. forward search F L T = bacabbabb$ b $ a b a c b a b b b b b $ b a b a c a 27

  28. forward search F L T = bacabbabb$ b $ a b a c b a b b b b b $ b a b a c a 28

  29. forward search F L T = bacabbabb$ b $ a b a c b a b b b b b $ b a b a c a 29

  30. forward search F L T = bacabbabb$ b $ a b a c can calculate with b a rank and select on F and L b b b b b $ b a b a c a 30

  31. L .rank L [ i ] ( L [ i ]) forward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 2 a c 1 FL mapping: 3 a b 3 FL( i ) = L .select F [ i ] ( F .rank F [ i ] ( F [ i ]) ) 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b a 2 1 c a 3 F .rank F [i] ( F [ i ]) 31

  32. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 2 a c 1 3 a b 3 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 32 FM index [Ferragina, Manzini '00]

  33. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 2 a c 1 3 a b 3 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 33 FM index [Ferragina, Manzini '00]

  34. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 2 a c 1 3 a b 3 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 34 FM index [Ferragina, Manzini '00]

  35. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 2 a c 1 3 a b 3 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 35 FM index [Ferragina, Manzini '00]

  36. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 LF mapping: 2 a c 1 LF( i ) := F .select L [ i ] ( L .rank L [ i ] ( i ) ) 3 a b 3 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 36 FM index [Ferragina, Manzini '00]

  37. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 LF mapping: 2 a c 1 LF( i ) := F .select L [ i ] ( L .rank L [ i ] ( i ) ) 3 a b 3 = F .select L [ i ] (1) + L .rank L [ i ] ( i )-1 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 37 FM index [Ferragina, Manzini '00]

  38. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 LF mapping: 2 a c 1 LF( i ) := F .select L [ i ] ( L .rank L [ i ] ( i ) ) 3 a b 3 = F .select L [ i ] (1) + L .rank L [ i ] ( i )-1 1 b b 4 2 b b 5 = |{ j : L [ j ] < L [ i ]}| + L .rank L [ i ] ( i ) 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 38 FM index [Ferragina, Manzini '00]

  39. LF: time complexity If we store BWT( T ) in L : – L [ i ] = BWT[ i ]: O(1) time ⇒ for any c : L .rank c ( i ) in O( n ) time – LF( i ) = |{ j : L [ j ] < L [ i ]}| + L .rank L [ i ] ( i ) O( n ) time O( n ) time 39

  40. FL: time complexity ● FL( i ) = L .select F [ i ] ( F .rank F [ i ] ( F [ i ]) ) FL(i) = L .select F [ i ] ( i - |{ j : L [ j ] < i }| ) ● If we know F [ i ]: FL( i ) in O( n ) time ● however, the fastest in-place computation of F [ i ] takes O( n 1+ε ) time [Munro,Raman '96] for any constant ε with 0 < ε < 1 40

  41. road map text 1. O( n 2+ ε ) O( n 2 ) BWT BBWT 2. O( n 2+ ε ) working space: n lg σ + O(lg n ) bits (including text) 41

  42. text BBWT → 42

  43. text BBWT → for each Lyndon factor T x with x = 1 up to t : prepend T x [| T x |] to BBWT p 1 (insert position in BBWT ) ← for each i = | T x |-1 down to 1 : p LF( p ) + 1 ← insert T x [ i ] at BBWT[ p ] [Bonomo+ '14] 43

  44. text BBWT → T = bacabbabb ● Lyndon factorization: b|ac|abb|abb ● fjrst: insert b 44

  45. text BBWT → T = bacabbabb ● Lyndon factorization: b|ac|abb|abb ● fjrst: insert b F L 1 b b 1 45

  46. text BBWT → T = bacabbabb F L 1 a b 1 ● Lyndon factorization: 2 a b 2 3 a c 1 b|ac|abb|abb 1 b b 3 ● fjrst: insert b 2 b b 4 3 b a 1 F L how to calculate? 4 b a 2 1 b b 1 5 b b 5 1 c a 3 46

  47. BBWT( T 1 T 2 ) T = b|ac|abb|abb = T 1 T 2 T 3 T 4 ● next Lyndon factor: ac F L 1 b b 1 47

  48. BBWT( T 1 T 2 ) T = b|ac|abb|abb = T 1 T 2 T 3 T 4 ● next Lyndon factor: ac F L F L 1 b b 1 1 b c 1 1 c b 1 48

  49. BBWT( T 1 T 2 ) T = b|ac|abb|abb = T 1 T 2 T 3 T 4 ● next Lyndon factor: ac F L F L F L 1 b b 1 1 b c 1 1 a c 1 1 c b 1 1 b b 1 1 c a 1 49

  50. BBWT( T 1 T 2 T 3 ) T = b|ac|abb|abb ● next Lyndon factor: abb F L 1 a c 1 1 b b 1 1 c a 1 50

  51. BBWT( T 1 T 2 T 3 ) T = b|ac|abb|abb ● next Lyndon factor: abb F L F L 1 a c 1 1 a b 1 1 b b 1 1 b c 1 1 c a 1 2 b b 2 1 c a 1 51

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend