Learned Index Structures
Bigtable Research Review Meeting Presented by Deniz Altinbuken January 29, 2018 paper by Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis go/learned-index-structures-presentation
Learned Index Structures paper by Tim Kraska, Alex Beutel, Ed H. - - PowerPoint PPT Presentation
Learned Index Structures paper by Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis Bigtable Research Review Meeting Presented by Deniz Altinbuken go/learned-index-structures-presentation January 29, 2018 Objectives 1. Show
Bigtable Research Review Meeting Presented by Deniz Altinbuken January 29, 2018 paper by Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis go/learned-index-structures-presentation
learning models: learned indexes.
main challenges in designing learned index structures.
management system through learned models can be very powerful.
that they can be general purpose. ○ They do not take advantage of patterns.
the patterns and correlations in the data and thus enable the automatic synthesis of specialized index structures:learned indexes.
Recognizing handwriting
loop at the top, and a vertical stroke in the bottom right".
algorithmically. ○ Too many exceptions, special cases.
Background Conclusion
Learned Index Structures
Results
Neural networks approach the problem in a different way.
Background Conclusion
Learned Index Structures
Results
Neural networks approach the problem in a different way.
Background Conclusion
Learned Index Structures
Results
Neural networks approach the problem in a different way.
Background Conclusion
Learned Index Structures
Results
A perceptron takes several binary inputs, x1,x2,… and produces a single binary output: The output is computed as a function of the inputs, where weights w1,w2,… express the importance of inputs to the
x1 x2 x3
w1 w2 w3
Background Conclusion
Learned Index Structures
Results
The output is determined by whether the weighted sum ∑jwjxj is less than or greater than some threshold value. Just like the weights, the threshold is a number which is a parameter of the neuron. If the threshold is reached, the neuron fires.
x1 x2 x3
w1 w2 w3
t
Background Conclusion
Learned Index Structures
Results
The output is determined by whether the weighted sum ∑jwjxj is less than or greater than some threshold value. Just like the weights, the threshold is a number which is a parameter of the neuron. If the threshold is reached, the neuron fires.
0 if ∑j wjxj ≤ threshold
1 if ∑j wjxj > threshold
Background Conclusion
Learned Index Structures
Results
A more common way to describe a perceptron is:
0 if w⋅x + bias ≤ 0
1 if w⋅x + bias > 0 0 if ∑j wjxj ≤ threshold
1 if ∑j wjxj > threshold
Bias describes how easy it is to get the neuron to fire.
w⋅x
Background Conclusion
Learned Index Structures
Results
models of decision-making.
make quite subtle decisions.
inputs
Background Conclusion
Learned Index Structures
Results
models of decision-making.
make quite subtle decisions.
inputs
1st layer 2nd layer
Background Conclusion
Learned Index Structures
Results
models of decision-making.
make quite subtle decisions.
input layer
hidden layers
inputs
Background Conclusion
Learned Index Structures
Results
Background Conclusion
Learned Index Structures
Results
Background Conclusion
Learned Index Structures
Results
Earlier
Background Conclusion
Learned Index Structures
Results
and biases so that the output from the network correctly classifies a given digit.
the network if this causes a small corresponding change in the output from the network, the network can learn.
Background Conclusion
Learned Index Structures
Results
and biases so that the output from the network correctly classifies a given digit.
the network if this causes a small corresponding change in the output from the network, the network can learn. Trying to create the right mapping for all cases.
Background Conclusion
Learned Index Structures
Results
Background Conclusion
Learned Index Structures
Results
Background Conclusion
Learned Index Structures
Results
so that small changes in their weights and bias cause
inputs
w + Δw Small Δ in any weight or bias causes a small Δ in the output!
Background Conclusion
Learned Index Structures
Results
any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.
Background Conclusion
Learned Index Structures
Results
any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.
σ(z) = 1 1 + e-z sigmoid function
Background Conclusion
Learned Index Structures
Results
any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.
Background Conclusion
Learned Index Structures
Results
Great for representing probabilities!
information from earlier layers better as it goes through hidden layers.
inputs
Background Conclusion
Learned Index Structures
Results
information from earlier layers better as it goes through hidden layers. 0 if x ≤ 0
x if x > 0
Background Conclusion
Learned Index Structures
Results
To get an intuition about the neurons, it helps to see the shape of the activation function.
Background Conclusion
Learned Index Structures
Results
neural networks.
○ A B-tree is a model that takes a key as an input and predicts the position of a data record. ○ A bloom filter is a binary classifier, which given a key predicts if a key exists in a set or not.
Background Results Conclusion
Learned Index Structures
The B-tree provides a mapping from a lookup key into a position inside the sorted array of records.
Background Results Conclusion
Learned Index Structures
The B-tree provides a mapping from a lookup key into a position inside the sorted array of records. For efficiency, index to page granularity.
Background Results Conclusion
Learned Index Structures
The B-tree provides a mapping from a lookup key into a position inside the sorted array of records. Map a key to a position with a min and max error.
Background Results Conclusion
Learned Index Structures
similar strong guarantees about the min and max error.
data, not for all possible data. ○ The min and max error is the maximum error of the model over the training data. ○ Execute the model for every key and remember the worst over- and under-prediction of a position.
Background Results Conclusion
Learned Index Structures
are good in taking advantage of the cache.
continuously mapped to memory or disk.
might return positions outside the min/max error range if they are not monotonically increasing models.
Background Results Conclusion
Learned Index Structures
best case).
distributions, mixtures and other data peculiarities and patterns and make use of these. ○ Have to balance the complexity of the model with its accuracy.
Background Results Conclusion
Learned Index Structures
index over the timestamps using Tensorflow. ○ Two-layer fully-connected NN with 32 neurons per layer using ReLU activation functions; the timestamps are the inputs and the positions are the outputs. ○ Lookup time ≈ 80,000 ns (model execution only).
Background Results Conclusion
Learned Index Structures
index over the timestamps using Tensorflow. ○ Two-layer fully-connected NN with 32 neurons per layer using ReLU activation functions; the timestamps are the inputs and the positions are the outputs. ○ Lookup time ≈ 80,000 ns (model execution only).
an item from the entire data set to a region of thousands, but inefficient for the “last mile”.
Background Results Conclusion
Learned Index Structures
Background Results Conclusion
Learned Index Structures
Background Results Conclusion
Learned Index Structures
given an index specification, LIF generates different index configurations, optimizes them, and tests them automatically.
extracts all weights from the model and generates efficient index structures in C++ based on the model specification.
Background Results Conclusion
Learned Index Structures
○ Reducing min/max error to 100 from 100M records using a single model is very hard. ○ Reducing the error to 10k from 100M is much easier to achieve even with simple models. ○ Reducing the error from 10k to 100 is simpler as the model can focus only on a subset of the data.
Background Results Conclusion
Learned Index Structures
○ Reducing min/max error to 100 from 100M records using a single model is very hard. ○ Reducing the error to 10k from 100M is much easier to achieve even with simple models. ○ Reducing the error from 10k to 100 is simpler as the model can focus only on a subset of the data. 💢 Use a hierarchical approach where we can have models focus on smaller subsets of data.
Background Results Conclusion
Learned Index Structures
Take a layered approach and have models focus on limited layers:
Background Results Conclusion
Learned Index Structures
Take a layered approach and have models focus on limited layers:
Reduce from 100M to 1M Reduce from 1M to 10k Reduce from 10k to 100
Background Results Conclusion
Learned Index Structures
Take a layered approach and have models focus on limited layers:
Check out the math in the paper if you’re interested in the details! :)
Reduce from 100M to 1M Reduce from 1M to 10k Reduce from 10k to 100
Background Results Conclusion
Learned Index Structures
With a layered approach we can build mixtures of models!
Reduce from 100M to 1M Reduce from 1M to 10k Reduce from 10k to 100
small ReLU NN
Linear Regression Linear Regression Linear Regression
B-tree B-tree B-tree B-tree
Background Results Conclusion
Learned Index Structures
Starting from the entire dataset (line 3), it trains first the top-node model. Based on the prediction of this model, it then picks the model from the next stage (lines 9 and 10) and adds all keys which fall into that model (line 10). Finally, in the case of hybrid indexes, the index is optimized by replacing NN models with B-trees if absolute min-/max-error is above a predefined threshold (lines 11-14).
Background Results Conclusion
Learned Index Structures
Background Results Conclusion
Learned Index Structures
To find the record either binary search or scanning is used. Models might generate more information than page location.
○ Set first middle point to pos predicted by the model.
○ Use standard deviation σ of the last stage model to set middle.
○ Pick three middle points as pos − σ, pos, pos + σ.
Background Results Conclusion
Learned Index Structures
○ Represent string as a vector, where each element is the decimal ASCII value of a char. ○ Limit size of vector to N to have equally-sized inputs
Background Results Conclusion
Learned Index Structures
○ No need to relearn if the model can only learn the key trend for the new items.
○ If inserts follow roughly a similar pattern as the learned CDF, retraining is not needed since the index “generalizes” over the new items and inserts become an O(1) operation.
Background Results Conclusion
Learned Index Structures
Background Results Conclusion
Learned Index Structures
Hashmaps use a hash function to deterministically map keys to random positions inside an array.
Background Results Conclusion
Learned Index Structures
Main challenge is to reduce conflicts.
records and combine it with additional data structures. ○ Dense hashmap: typical overhead of 78% memory. ○ Sparse hashmap: only 4 bits overhead, but is up to 3-7 times slower because of its search and data placement strategy.
Background Results Conclusion
Learned Index Structures
into a unique position inside the array we could avoid conflicts.
hashmap and use h(K) = F(K) ∗ M, K is hash function.
conflicts would exist.
Background Results Conclusion
Learned Index Structures
Bloom filters are probabilistic data structures used to test whether an element is a member of a set.
Bloom filter insertion Learned bloom filter insertion
Background Results Conclusion
Learned Index Structures
separates keys from everything else. ○ A good hash function for a bloom filter should have lots of collisions among keys and lots of collisions among non-keys, but few collisions of keys and non-keys.
predict if an input x is a key or non-key.
Background Results Conclusion
Learned Index Structures
predict if an input x is a key or non-key. ○ Use sigmoid neurons to find probability between 0,1. ○ The output of NN is the probability that input x is a key in our database. ○ Choose a threshold t above which we will assume the key exists in our database. ○ Tune threshold t to achieve the desired false positive rate. ○ To prevent false negatives, use overflow bloom filter.
Background Results Conclusion
Learned Index Structures
structures with B-trees.
○ Compare lookup-time (model execution time + local search time). ○ Compare index structure size. ○ Compare model error and error variance.
and insertion time are not included.
○ A model without hidden layers can be trained on over 200M records in just few seconds.
Background Results Conclusion
Learned Index Structures
200M log entries for requests to a major university website. Index over all unique timestamps.
Background Results Conclusion
Learned Index Structures
200M log entries for requests to a major university website. Index over all unique timestamps.
The model error is the averaged standard error
last stage, whereas the error variance indicates how much this standard error varies between the models.
Baseline Background Results Conclusion
Learned Index Structures
Model is 3× faster and up to an order-of-magnitude smaller.
Background Results Conclusion
Learned Index Structures
Quarternary search only helps a little bit.
Background Results Conclusion
Learned Index Structures
The error is high, which influences the search time.
Background Results Conclusion
Learned Index Structures
Index of the longitude of ≈ 200M user-maintained features across the world. Relatively linear.
Background Results Conclusion
Learned Index Structures
Background Results Conclusion
Learned Index Structures
Model is 3× faster and up to an order-of-magnitude smaller.
Quarternary search does not help.
Background Results Conclusion
Learned Index Structures
Synthetic dataset of 190M unique values to test how the index works on heavy-tail distributions. Highly non-linear, making the distribution more difficult to learn.
Background Results Conclusion
Learned Index Structures
Background Results Conclusion
Learned Index Structures
The error is high, which influences the search time.
synthetic dataset and the weblog data the error is much higher.
size and lookup performance. ○ This is not surprising as the second stage determines how many models have to be stored. Worth noting is that our second stage uses 10,000 or more models.
Background Results Conclusion
Learned Index Structures
The web-document dataset consists of the 10M non-continuous document-ids of a large web index used as part of a real product at a large internet company.
Background Results Conclusion
Learned Index Structures
Speedups for learned indexes is not as prominent, so hybrid indexes, which replace bad performing models with B-trees actually help to improve performance.
Background Results Conclusion
Learned Index Structures
Because cost of searching is higher, the different search strategies make a bigger difference. The reason why biased search and quaternary search performs better is that they can take the standard error into account.
Background Results Conclusion
Learned Index Structures
performance and utilizes the memory better.
slots, the improvement disappears.
Background Results Conclusion
Learned Index Structures
dataset: 1.7M unique URLs.
model is, the better the savings in bloom filter size.
Background Results Conclusion
Learned Index Structures
a desired 1% false positive rate requires 2.04MB.
32-dim embedding for each character; the model is 0.0259MB, with the spillover it is 1.07 MB.
Background Results Conclusion
Learned Index Structures
multi-dimensional index structures. Models, especially neural nets, are extremely good at capturing complex high-dimensional relationships.
sorting and joins, not just indexes.
indexes even more viable.
Background Results Conclusion
Learned Index Structures
Background Results Conclusion
Learned Index Structures