Succinct Trie Indexes Made Practical Huanchen Zhang David G. - PowerPoint PPT Presentation
Succinct Trie Indexes Made Practical Huanchen Zhang David G. Andersen, Michael Kaminsky, Andrew Pavlo, Kimberly Keeton DRAM price wont fall forever Price Year Memory-efficient data structures are helpful Smaller data structures More
Succinct Trie Indexes Made Practical Huanchen Zhang David G. Andersen, Michael Kaminsky, Andrew Pavlo, Kimberly Keeton
DRAM price won’t fall forever Price Year
Memory-efficient data structures are helpful Smaller data structures More data resident in faster memory Better performance + lower costs
The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n
The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n % !"# 2 + − (+ − 1)!"# 2 (+ − 1) bits |n-node trie of degree k| '()* ⁄ +% + 1 = (
The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n % !"# 2 + − (+ − 1)!"# 2 (+ − 1) bits |n-node trie of degree k| '()* ⁄ +% + 1 = ( 256 9.44n
The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n % !"# 2 + − (+ − 1)!"# 2 (+ − 1) bits |n-node trie of degree k| '()* ⁄ +% + 1 = ( 256 9.44n FST = 10n
Succinct Data Structures Use # of bits close to ITLB Suppose ITLB = L bits Implicit: L + O(1) Succinct: L + o(L) Compact: O(L) FST
Why aren’t succinct data structures popular? Read-only Log-structured design Slow Complex
Existing succinct tries are slow 50M 64-bit integer keys Memory Lookup Latency including key suffixes 3 1.5 2 1 GB us 1 0.5 0 0 ART tx-trie PDT ART tx-trie PDT
Fast Succinct Trie (FST) is fast and small 50M 64-bit integer keys Memory Lookup Latency including key suffixes 3 1.5 2 1 GB us 1 0.5 0 0 ART tx-trie PDT FST ART tx-trie PDT FST
Encoding Mechanism
3 ways to succinctly encode ordinal trees Ordinal tree: a rooted tree where each node can have an arbitrary # of children in order 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees ! $" ≈ 2' bits |n-node ordinal tree| = C n = " "#! 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees LOUDS: level-ordered unary degree sequence 0 110 1 2 10 110 3 4 5 1110 110 110 6 7 8 9 A B C 0 10 0 0 0 10 0 D E 0 0
3 ways to succinctly encode ordinal trees LOUDS: 110 10 110 1110 110 110 0 10 0 0 0 10 0 0 0 0 110 1 2 10 110 3 4 5 1110 110 110 6 7 8 9 A B C 0 10 0 0 0 10 0 D E 0 0
3 ways to succinctly encode ordinal trees BP: balanced parenthesis 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees BP: ( ( ( ( ) ( ( ) ) ( ) ) ) ( ( ( ) ( ) ) ( ( ( ) ) ( ) ) ) ) 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees 3 0 2 BP: ( ( ( ( ) ( ( ) ) ( ) ) ) ( ( ( ) ( ) ) ( ( ( ) ) ( ) ) ) ) 8 6 D 9 A E C 7 B 4 0 1 5 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees DFUDS: depth-first unary degree sequence 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees DFUDS: ( ( ) ( ) ( ( ( ) ) ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ( ) ) ) 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees DFUDS: ( ( ) ( ) ( ( ( ) ) ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ( ) ) ) 0 1 3 6 7 D 8 2 4 9 A 5 B E C 0 1 2 3 4 5 6 7 8 9 A B C D E
LOUDS-Sparse: succinctly encode tries L: f s t $ a o r r s t y p i y $ t e p f t HC: 1010 1 110 100 0 10 000 0 s S: 1001 0 101001 0 10 101 0 $ r o v 1 v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: a r t y p y v 2 i s $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11
LOUDS-Sparse: succinctly encode tries L: f s t $ a o r r s t y p i y $ t e p f t HC: 1010 1 110 100 0 10 000 0 s S: 1001 0 101001 0 10 101 0 $ r o v 1 v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: a Why LOUDS? r t y p y v 2 i s 1. Fast tree nav. 2. Good label locality $ t e p v 3 v 4 v 5 v 6 v 7 3. Easy implementation v 8 v 9 v 10 v 11
Rank & select on bit-vectors 0 5 10 15 bv: 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 rank(bv, i) = # of 1’s in bv up to position i select(bv, i) = position of the ith 1 in bv Examples: rank(bv, 7) = 4 select(bv, 7) = 14
Compute rank & select in constant time The classic algorithm for computing rank bv
Compute rank & select in constant time The classic algorithm for computing rank !" # $ bits super block = … bv
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block =
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block cumulative rank
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block cumulative rank rank in super block
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & '( 2 *
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & & ! '( 2 * " '(*
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & & remaining ! bits '( 2 * " '(*
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & & remaining O (1) time ! bits '( 2 * " '(*
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries + + remaining O (1) time ! bits () 2 * " ()* O ( % O ( % #$% ) #$% ()()*) space: o (*) O ( * ()* ()()*)
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries + + remaining O (1) time ! bits () 2 * " ()* O ( % O ( % #$% ) #$% ()()*) space: o (*) O ( * ()* ()()*) Select is similar but trickier, often based on rank structures
Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s parent(i) = select(S, rank(S, i)-1) value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11
Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s parent(i) = select(S, rank(S, i)-1) value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11
Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11
Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.