string indexing in the word ram model part 4
play

String indexing in the Word RAM model, part 4 Pawe Gawrychowski - PowerPoint PPT Presentation

String indexing in the Word RAM model, part 4 Pawe Gawrychowski University of Wrocaw & Max-Planck-Institut fr Informatik Pawe Gawrychowski String indexing in the Word RAM model IV 1 / 32 We consider a fundamental data structure


  1. String indexing in the Word RAM model, part 4 Paweł Gawrychowski University of Wrocław & Max-Planck-Institut für Informatik Paweł Gawrychowski String indexing in the Word RAM model IV 1 / 32

  2. We consider a fundamental data structure question: how to represent a tree? (Compacted) Trie A trie is simply a tree with edges labeled by single characters. A compacted trie is created by replacing maximal chains of unary vertices with single edges labeled by (possibly long) words. Navigation queries Given a pattern p , we want to traverse the edges of a compacted trie to find the node corresponding to p . If there is no such node, we would like to compute its longest prefix for which the corresponding node does exist. Paweł Gawrychowski String indexing in the Word RAM model IV 2 / 32

  3. Consider p = wewpxcwrehyzrt and the following compacted trie. qoidkbasdk wewpxc w t r r q e w qtkjdknewnbog povmnxd tovndfed hyugfecvbx n b o g khjkdjd d d f n v m c Paweł Gawrychowski String indexing in the Word RAM model IV 3 / 32

  4. Consider p = wewpxcwrehyzrt and the following compacted trie. qoidkbasdk wewpxc w t r r q e w qtkjdknewnbog povmnxd tovndfed hyugfecvbx n b o g khjkdjd d d f n v m c Paweł Gawrychowski String indexing in the Word RAM model IV 3 / 32

  5. Consider p = wewpxcwrehyzrt and the following compacted trie. qoidkbasdk wewpxc w t r r q e w qtkjdknewnbog povmnxd tovndfed hyugfecvbx n b o g khjkdjd d d f n v m c Paweł Gawrychowski String indexing in the Word RAM model IV 3 / 32

  6. Consider p = wewpxcwrehyzrt and the following compacted trie. qoidkbasdk wewpxc w t r r q e w qtkjdknewnbog povmnxd tovndfed hy n b ugfecvbx o g khjkdjd d d f n v m c Paweł Gawrychowski String indexing in the Word RAM model IV 3 / 32

  7. Splitting an edge Given an edge, we want to split it into two parts by (possibly) creating a node, and adding a new edge outgoing from this middle node. abrakadabra Notice that this covers adding a new edge outgoing from an existing node. Paweł Gawrychowski String indexing in the Word RAM model IV 4 / 32

  8. Splitting an edge Given an edge, we want to split it into two parts by (possibly) creating a node, and adding a new edge outgoing from this middle node. abrakadabra z y x Notice that this covers adding a new edge outgoing from an existing node. Paweł Gawrychowski String indexing in the Word RAM model IV 4 / 32

  9. Static case (yesterday) Given a compacted trie, can we quickly construct a small structure which allows us to execute navigation queries efficiently ? Dynamic case Can we maintain a compacted trie so that: the resulting structure is small , 1 we can execute navigation queries efficiently , 2 we can split any edge efficiently ? 3 There are clearly three parameters: the number of nodes in the compacted trie n , the size of the alphabet σ , and the length of the pattern m . We aim to achieve good bounds in terms of those n , σ, m . Paweł Gawrychowski String indexing in the Word RAM model IV 5 / 32

  10. Static case (yesterday) Given a compacted trie, can we quickly construct a small structure which allows us to execute navigation queries efficiently ? Dynamic case Can we maintain a compacted trie so that: the resulting structure is small , 1 we can execute navigation queries efficiently , 2 we can split any edge efficiently ? 3 There are clearly three parameters: the number of nodes in the compacted trie n , the size of the alphabet σ , and the length of the pattern m . We aim to achieve good bounds in terms of those n , σ, m . Paweł Gawrychowski String indexing in the Word RAM model IV 5 / 32

  11. Static case (yesterday) Given a compacted trie, can we quickly construct a small structure which allows us to execute navigation queries efficiently ? Dynamic case Can we maintain a compacted trie so that: the resulting structure is small , 1 we can execute navigation queries efficiently , 2 we can split any edge efficiently ? 3 There are clearly three parameters: the number of nodes in the compacted trie n , the size of the alphabet σ , and the length of the pattern m . We aim to achieve good bounds in terms of those n , σ, m . Paweł Gawrychowski String indexing in the Word RAM model IV 5 / 32

  12. It seems reasonable to consider the scenario where σ is non-constant, yet (significantly) smaller than n . Hence we get the following question: what are the best possible time bounds in terms of σ ? Gawrychowski and Fischer There exists a deterministic linear-size structure supporting navigation log log log σ ) time and splitting edges in O ( log 2 log σ log 2 log σ in O ( m + log log log σ ) . To make the above result useful, we develop a suffix tree oracle which can be used to locate the edge which should be split after prepending log 2 log σ a letter to the current text in O ( log log n + log log log σ ) time. Paweł Gawrychowski String indexing in the Word RAM model IV 6 / 32

  13. Let us consider the dynamic case, and assume that n = O ( σ ) . Here instead of the simple two-level scheme used in the static case we need to partition the nodes into more groups. Levels of nodes Let f ( ℓ ) = 2 ( 3 2 ) ℓ . We say that a node v is of level ℓ when the number of leaves in its subtree belongs to [ f ( ℓ ) , 2 f ( ℓ + 1 )] . We will maintain an invariant that a level of v doesn’t exceed the level of its parent. A fragment is a part of the tree consisting of nodes at the same level. Paweł Gawrychowski String indexing in the Word RAM model IV 7 / 32

  14. Let us consider the dynamic case, and assume that n = O ( σ ) . Here instead of the simple two-level scheme used in the static case we need to partition the nodes into more groups. Levels of nodes Let f ( ℓ ) = 2 ( 3 2 ) ℓ . We say that a node v is of level ℓ when the number of leaves in its subtree belongs to [ f ( ℓ ) , 2 f ( ℓ + 1 )] . We will maintain an invariant that a level of v doesn’t exceed the level of its parent. A fragment is a part of the tree consisting of nodes at the same level. Paweł Gawrychowski String indexing in the Word RAM model IV 7 / 32

  15. Let us consider the dynamic case, and assume that n = O ( σ ) . Here instead of the simple two-level scheme used in the static case we need to partition the nodes into more groups. Levels of nodes Let f ( ℓ ) = 2 ( 3 2 ) ℓ . We say that a node v is of level ℓ when the number of leaves in its subtree belongs to [ f ( ℓ ) , 2 f ( ℓ + 1 )] . We will maintain an invariant that a level of v doesn’t exceed the level of its parent. A fragment is a part of the tree consisting of nodes at the same level. Paweł Gawrychowski String indexing in the Word RAM model IV 7 / 32

  16. Now, we classify the edges into two types: from a node to a node of the same level, 1 from a node to a node of a smaller level, 2 Paweł Gawrychowski String indexing in the Word RAM model IV 8 / 32

  17. Now, we classify the edges into two types: from a node to a node of the same level, 1 from a node to a node of a smaller level, 2 Those edges are stored in a static dictionary with a constant access time. We already know that such dictionary can be constructed in close-to-linear time, and this turns out to be enough because of the way we defined the levels. More precisely, it cannot happen too often that a level of a node increases. Paweł Gawrychowski String indexing in the Word RAM model IV 8 / 32

  18. Now, we classify the edges into two types: from a node to a node of the same level, 1 from a node to a node of a smaller level, 2 Those edges are stored in a dynamic dictionary structure. For this we develop a weighted variant of the exponential search trees of Andersson and Thorup, which we call the wexponential search trees. Andersson and Thorup 2002 An exponential search tree is a dynamic predecessor structure storing a subset of [ 1 , U ] with O ( log 2 log U log log log U ) time for insertions and predecessor queries. Paweł Gawrychowski String indexing in the Word RAM model IV 8 / 32

  19. Even without the modification, the query complexity is fairly decent, log 3 log σ namely O ( m + log log log σ ) . This is because there are at most t = Θ( log log σ ) edges of type (2) on any path descending from the root. w i ∈ [ f ( i ) , 2 f ( i + 1)] w t w t − 1 w t − 2 w t − 3 Paweł Gawrychowski String indexing in the Word RAM model IV 9 / 32

  20. We want to be faster though. The subsequent accesses to the dynamic dictionary structures are not completely independent, so there is hope! Wexponential search trees There exists a linear-size dynamic structure storing a collection of n weighted elements from [ 1 , U ] with the following bounds: predecessor search takes O ( log log W log log U log log log U ) , where W is the 1 log w current total weight, and w is the weight of the predecessor, inserting a new element of weight 1 takes O ( log log W ) , 2 increasing a weight of an element of weight w by 1 takes 3 O ( log log W log w ) . Paweł Gawrychowski String indexing in the Word RAM model IV 10 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend