Succinct Trie Indexes Made Practical
Huanchen Zhang
David G. Andersen, Michael Kaminsky, Andrew Pavlo, Kimberly Keeton
Succinct Trie Indexes Made Practical Huanchen Zhang David G. - - PowerPoint PPT Presentation
Succinct Trie Indexes Made Practical Huanchen Zhang David G. Andersen, Michael Kaminsky, Andrew Pavlo, Kimberly Keeton DRAM price wont fall forever Price Year Memory-efficient data structures are helpful Smaller data structures More
David G. Andersen, Michael Kaminsky, Andrew Pavlo, Kimberly Keeton
|S| = n !"#2% bits
|S| = n !"#2% bits |n-node trie of degree k| = ⁄
'()* (
+% + 1
% !"#2+ − (+ − 1)!"#2(+ − 1) bits
|S| = n !"#2% bits |n-node trie of degree k| 256 9.44n = ⁄
'()* (
+% + 1
% !"#2+ − (+ − 1)!"#2(+ − 1) bits
|S| = n !"#2% bits |n-node trie of degree k| 256 9.44n = ⁄
'()* (
+% + 1
% !"#2+ − (+ − 1)!"#2(+ − 1) bits
FST = 10n
FST
1 2 3
0.5 1.5 1 ART tx-trie PDT
50M 64-bit integer keys
including key suffixes
ART tx-trie PDT
1 2 3
0.5 1.5 1 ART tx-trie PDT FST
ART tx-trie PDT FST 50M 64-bit integer keys
including key suffixes
1 2 3 4 5 6 7 8 9 A B C D E
Ordinal tree: a rooted tree where each node can have an arbitrary # of children in order
1 2 3 4 5 6 7 8 9 A B C D E
|n-node ordinal tree| = Cn =
! "#! $" "
≈ 2' bits
1 2 3 4 5 6 7 8 9 A B C D E
110 10 110 1110 110 110 10 10 LOUDS: level-ordered unary degree sequence
1 2 3 4 5 6 7 8 9 A B C D E
110 10 110 1110 110 110 10 10 110 10 110 1110 110 110 0 10 0 0 0 10 0 0 0 LOUDS:
1 2 3 4 5 6 7 8 9 A B C D E
BP: balanced parenthesis
1 2 3 4 5 6 7 8 9 A B C D E
BP: ( ( ( ( ) ( ( ) ) ( ) ) ) ( ( ( ) ( ) ) ( ( ( ) ) ( ) ) ) )
1 2 3 4 5 6 7 8 9 A B C D E
BP: ( ( ( ( ) ( ( ) ) ( ) ) ) ( ( ( ) ( ) ) ( ( ( ) ) ( ) ) ) )
D 6 8 7 3 1 9 A 4 E B C 5 2
1 2 3 4 5 6 7 8 9 A B C D E
DFUDS: depth-first unary degree sequence
1 2 3 4 5 6 7 8 9 A B C D E
DFUDS: ( ( ) ( ) ( ( ( ) ) ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ( ) ) )
1 2 3 4 5 6 7 8 9 A B C D E
DFUDS: ( ( ) ( ) ( ( ( ) ) ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ( ) ) )
1 3 6 7 D 8 2 4 9 A 5 B E C
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
Why LOUDS?
1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 bv: rank(bv, i) = select(bv, i) = # of 1’s in bv up to position i position of the ith 1 in bv rank(bv, 7) = 4 Examples: select(bv, 7) = 14
5 10 15
The classic algorithm for computing rank bv
The classic algorithm for computing rank bv
!"#$ bits super block =
The classic algorithm for computing rank bv
! "#$% bits basic block = #$"% bits super block =
The classic algorithm for computing rank bv
! "#$% bits basic block = #$"% bits super block = per super block cumulative rank
The classic algorithm for computing rank bv
! "#$% bits basic block = #$"% bits super block = per super block cumulative rank per basic block rank in super block
The classic algorithm for computing rank bv
! "#$% bits basic block = #$"% bits super block = per super block cumulative rank per basic block rank in super block within super block all possible queries
The classic algorithm for computing rank bv
! "#$% bits basic block = #$"% bits super block = per super block cumulative rank per basic block rank in super block within super block all possible queries & '(2*
The classic algorithm for computing rank bv
! "#$% bits basic block = #$"% bits super block = per super block cumulative rank per basic block rank in super block within super block all possible queries & '(2* &
! "'(*
The classic algorithm for computing rank bv
! "#$% bits basic block = #$"% bits super block = per super block cumulative rank per basic block rank in super block within super block all possible queries & '(2* &
! "'(*
remaining bits
The classic algorithm for computing rank bv
! "#$% bits basic block = #$"% bits super block = per super block cumulative rank per basic block rank in super block within super block all possible queries & '(2* &
! "'(*
remaining bits
O(1) time
The classic algorithm for computing rank bv
! "#$% bits basic block = #$"% bits super block =
O( %
#$%)
O( %
#$% ()()*)
per super block cumulative rank per basic block rank in super block within super block all possible queries
O( * ()* ()()*)
+ ()2* +
! "()*
remaining bits
space:
O(1) time
The classic algorithm for computing rank Select is similar but trickier, often based on rank structures bv
! "#$% bits basic block = #$"% bits super block =
O( %
#$%)
O( %
#$% ()()*)
per super block cumulative rank per basic block rank in super block within super block all possible queries
O( * ()* ()()*)
+ ()2* +
! "()*
remaining bits
space:
O(1) time
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) parent(i) = select(S, rank(S, i)-1) value(i) = i - rank(HC, i)
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) parent(i) = select(S, rank(S, i)-1) value(i) = i - rank(HC, i)
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
2 2
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
3
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
5
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
5
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
6 6
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
6
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
12
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
12
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
12 12
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
8
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
16
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
16
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
16 16 16
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
7 16
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
9
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
child(i) = select(S, rank(HC, i)+1) value(i) = i - rank(HC, i)
9
5 10 15
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 Majority of nodes Frequently visited
Divided by size ratio
f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 Space-efficient Fast
Divided by size ratio
f s t $ a o r L: 1010 1 11 HC: 1001 0 10 S: V:
v1 v2 f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7
Space-efficient Fast
L: HC: IsPrefixKey: V:
v1 v2 f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7
f st a
1
Space-efficient Fast
L: HC: IsPrefixKey: V:
v1 v2 f t s $ a
r t s y p i y $ t e p
v2 v1 v3 v8 v9 v10 v11 v4 v5 v6 v7 r s t y p i y $ t e p L: 0 100 010 000 0 HC: 1001 0 10 101 0 S: V: v3 v4v5v6 v7v8v9v10v11
f st a
1 Space-efficient Fast
basic block
size = 512 bits for LOUDS-Sparse size = 64 bits for LOUDS-Dense
within basic block: use popcount instruction
(every x 1’s)
a b c d e g h I j k l m n o p q r u v w x y z L: f s t … node boundary 1 0 0 0
1 0 0 … S:
a b c d e g h I j k l m n o p q r u v w x y z L: f s t … node boundary 1 0 0 0
1 0 0 … S: 128-bit SIMD
a b c d e g h I j k l m n o p q r u v w x y z L: f s t … node boundary 1 0 0 0
1 0 0 … S:
f s t $ a o r r s t y p i y $ t e p L: 1010 1 110 100 0 10 000 0 HC: 1001 0 101001 0 10 101 0 S: V:
v1 v2 v3 v4v5v6 v7v8v9v10v11
1200 800 400
base line LOUDS- Dense rank
select
SIMD search prefetch
50M 64-bit integer keys
100 200 300 400 500 200 400 600 800 1000
Memory(MB) better
B+tree ART C-ART FST low cost high cost
Cost function: ! = #$% r > 1 r < 1 favors performance favors space 50M 64-bit integer keys Latency(ns) $ = 1