Learned Index Structures paper by Tim Kraska, Alex Beutel, Ed H. - - PowerPoint PPT Presentation

learned index structures
SMART_READER_LITE
LIVE PREVIEW

Learned Index Structures paper by Tim Kraska, Alex Beutel, Ed H. - - PowerPoint PPT Presentation

Learned Index Structures paper by Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis Bigtable Research Review Meeting Presented by Deniz Altinbuken go/learned-index-structures-presentation January 29, 2018 Objectives 1. Show


slide-1
SLIDE 1

Learned Index Structures

Bigtable Research Review Meeting Presented by Deniz Altinbuken January 29, 2018 paper by Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis go/learned-index-structures-presentation

slide-2
SLIDE 2

Objectives

  • 1. Show that all index structures can be replaced with deep

learning models: learned indexes.

  • 2. Analyze under which conditions learned indexes
  • utperform traditional index structures and describe the

main challenges in designing learned index structures.

  • 3. Show that the idea of replacing core components of a data

management system through learned models can be very powerful.

slide-3
SLIDE 3

Claims

  • Traditional indexes assume worst case data distribution so

that they can be general purpose. ○ They do not take advantage of patterns.

  • Knowing the exact data distribution enables highly
  • ptimizing any index the database system uses.
  • ML opens up the opportunity to learn a model that reflects

the patterns and correlations in the data and thus enable the automatic synthesis of specialized index structures:learned indexes.

slide-4
SLIDE 4

A model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. Main Idea

slide-5
SLIDE 5

Background Learned Index Structures Results Conclusion

slide-6
SLIDE 6

Background

slide-7
SLIDE 7

Neural Networks: An Example

Recognizing handwriting

  • Very difficult to express our intuitions such as "9 has a

loop at the top, and a vertical stroke in the bottom right".

  • Very difficult to create precise rules and solve this

algorithmically. ○ Too many exceptions, special cases.

Background Conclusion

Learned Index Structures

Results

slide-8
SLIDE 8

Neural Networks: An Example

Neural networks approach the problem in a different way.

  • Take a large number of handwritten digits: training data.
  • Develop a system which can learn from the training data.

Background Conclusion

Learned Index Structures

Results

slide-9
SLIDE 9

Neural networks approach the problem in a different way.

Neural Networks: An Example

Automatically infer rules for recognizing handwritten digits by going through examples!

Background Conclusion

Learned Index Structures

Results

slide-10
SLIDE 10

Neural networks approach the problem in a different way.

Neural Networks: An Example

Create a network of neurons that can learn! :)

Background Conclusion

Learned Index Structures

Results

slide-11
SLIDE 11

Neurons: Perceptron

A perceptron takes several binary inputs, x1,x2,… and produces a single binary output: The output is computed as a function of the inputs, where weights w1,w2,… express the importance of inputs to the

  • utput.

x1 x2 x3

  • utput

w1 w2 w3

Background Conclusion

Learned Index Structures

Results

slide-12
SLIDE 12

The output is determined by whether the weighted sum ∑jwjxj is less than or greater than some threshold value. Just like the weights, the threshold is a number which is a parameter of the neuron. If the threshold is reached, the neuron fires.

Neurons: Perceptron

x1 x2 x3

  • utput

w1 w2 w3

t

Background Conclusion

Learned Index Structures

Results

slide-13
SLIDE 13

Neurons: Perceptron

The output is determined by whether the weighted sum ∑jwjxj is less than or greater than some threshold value. Just like the weights, the threshold is a number which is a parameter of the neuron. If the threshold is reached, the neuron fires.

0 if ∑j wjxj ≤ threshold

  • utput =

1 if ∑j wjxj > threshold

Background Conclusion

Learned Index Structures

Results

slide-14
SLIDE 14

A more common way to describe a perceptron is:

  • ∑jwjxj
  • -threshold bias

Neurons: Perceptron

0 if w⋅x + bias ≤ 0

  • utput =

1 if w⋅x + bias > 0 0 if ∑j wjxj ≤ threshold

  • utput =

1 if ∑j wjxj > threshold

Bias describes how easy it is to get the neuron to fire.

w⋅x

Background Conclusion

Learned Index Structures

Results

slide-15
SLIDE 15

Neurons: Perceptron

  • By varying the weights and the threshold, we get different

models of decision-making.

  • A complex network of perceptrons that uses layers can

make quite subtle decisions.

  • utput

inputs

Background Conclusion

Learned Index Structures

Results

slide-16
SLIDE 16

Neurons: Perceptron

  • By varying the weights and the threshold, we get different

models of decision-making.

  • A complex network of perceptrons that uses layers can

make quite subtle decisions.

  • utput

inputs

1st layer 2nd layer

Background Conclusion

Learned Index Structures

Results

slide-17
SLIDE 17

Neurons: Perceptron

  • By varying the weights and the threshold, we get different

models of decision-making.

  • A complex network of perceptrons that uses layers can

make quite subtle decisions.

  • utput

input layer

  • utput layer

hidden layers

inputs

Background Conclusion

Learned Index Structures

Results

slide-18
SLIDE 18

Neurons: Perceptron

Perceptrons are great for decision making.

Background Conclusion

Learned Index Structures

Results

slide-19
SLIDE 19

Neurons: Perceptron

How about learning?

Background Conclusion

Learned Index Structures

Results

slide-20
SLIDE 20

Neurons: Perceptron

Earlier

Automatically infer rules for recognizing handwritten digits by going through examples!

Background Conclusion

Learned Index Structures

Results

slide-21
SLIDE 21

Learning

  • A neural network goes through examples to learn weights

and biases so that the output from the network correctly classifies a given digit.

  • When a small change is made in some weight or bias in

the network if this causes a small corresponding change in the output from the network, the network can learn.

Background Conclusion

Learned Index Structures

Results

slide-22
SLIDE 22

Learning

  • A neural network goes through examples to learn weights

and biases so that the output from the network correctly classifies a given digit.

  • When a small change is made in some weight or bias in

the network if this causes a small corresponding change in the output from the network, the network can learn. Trying to create the right mapping for all cases.

Background Conclusion

Learned Index Structures

Results

slide-23
SLIDE 23

Learning

The neural network is “trained” by adjusting weights and biases to find the perfect model that would generate the expected output for the “training data”.

Background Conclusion

Learned Index Structures

Results

slide-24
SLIDE 24

Learning

Through training you minimize the prediction error. (But having perfect output is difficult.)

Background Conclusion

Learned Index Structures

Results

slide-25
SLIDE 25

Neurons: Sigmoid

  • Sigmoid neurons are similar to perceptrons, but modified

so that small changes in their weights and bias cause

  • nly a small change in their output.
  • utput + Δoutput

inputs

w + Δw Small Δ in any weight or bias causes a small Δ in the output!

Background Conclusion

Learned Index Structures

Results

slide-26
SLIDE 26

Neurons: Sigmoid

  • A sigmoid takes several inputs, x1,x2,… which can be

any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.

  • utput = σ(w⋅x + bias)

Background Conclusion

Learned Index Structures

Results

slide-27
SLIDE 27

Neurons: Sigmoid

  • A sigmoid takes several inputs, x1,x2,… which can be

any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.

  • utput = σ(w⋅x + bias)

σ(z) = 1 1 + e-z sigmoid function

Background Conclusion

Learned Index Structures

Results

slide-28
SLIDE 28

Neurons: Sigmoid

  • A sigmoid takes several inputs, x1,x2,… which can be

any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.

  • utput = σ(w⋅x + bias)

Background Conclusion

Learned Index Structures

Results

Great for representing probabilities!

slide-29
SLIDE 29

Neurons: ReLU (Rectified Linear Unit)

  • Better for deep learning because it preserves the

information from earlier layers better as it goes through hidden layers.

  • utput

inputs

Background Conclusion

Learned Index Structures

Results

slide-30
SLIDE 30

Neurons: ReLU (Rectified Linear Unit)

  • Better for deep learning because it preserves the

information from earlier layers better as it goes through hidden layers. 0 if x ≤ 0

  • utput =

x if x > 0

Background Conclusion

Learned Index Structures

Results

slide-31
SLIDE 31

Activation Functions (Transfer Functions)

To get an intuition about the neurons, it helps to see the shape of the activation function.

Background Conclusion

Learned Index Structures

Results

slide-32
SLIDE 32

Learned Index Structures

slide-33
SLIDE 33
  • Indexes are already to a large extent learned models like

neural networks.

  • Indexes predict the location of a value given a key.

○ A B-tree is a model that takes a key as an input and predicts the position of a data record. ○ A bloom filter is a binary classifier, which given a key predicts if a key exists in a set or not.

Index Structures as Neural Network Models

Background Results Conclusion

Learned Index Structures

slide-34
SLIDE 34

B-tree

The B-tree provides a mapping from a lookup key into a position inside the sorted array of records.

Background Results Conclusion

Learned Index Structures

slide-35
SLIDE 35

B-tree

The B-tree provides a mapping from a lookup key into a position inside the sorted array of records. For efficiency, index to page granularity.

Background Results Conclusion

Learned Index Structures

slide-36
SLIDE 36

B-tree

The B-tree provides a mapping from a lookup key into a position inside the sorted array of records. Map a key to a position with a min and max error.

Background Results Conclusion

Learned Index Structures

slide-37
SLIDE 37

Replace B-trees with ML Models!

  • We can replace the index with ML models that provide

similar strong guarantees about the min and max error.

  • The B-tree only provides this guarantee over the stored

data, not for all possible data. ○ The min and max error is the maximum error of the model over the training data. ○ Execute the model for every key and remember the worst over- and under-prediction of a position.

Background Results Conclusion

Learned Index Structures

slide-38
SLIDE 38
  • B-Trees have a bounded cost for inserts and lookups and

are good in taking advantage of the cache.

  • B-Trees can map keys to pages which are not

continuously mapped to memory or disk.

  • If a lookup key does not exist in the set, certain models

might return positions outside the min/max error range if they are not monotonically increasing models.

Challenges

Background Results Conclusion

Learned Index Structures

slide-39
SLIDE 39
  • Using ML models has the potential to transform the cost
  • f log n B-tree look-up into a constant operation (in the

best case).

  • Neural networks are able to learn a wide variety of data

distributions, mixtures and other data peculiarities and patterns and make use of these. ○ Have to balance the complexity of the model with its accuracy.

Advantages

Background Results Conclusion

Learned Index Structures

slide-40
SLIDE 40

A First, Naïve Learned Index

  • Use 200M web-server log records to build a secondary

index over the timestamps using Tensorflow. ○ Two-layer fully-connected NN with 32 neurons per layer using ReLU activation functions; the timestamps are the inputs and the positions are the outputs. ○ Lookup time ≈ 80,000 ns (model execution only).

Background Results Conclusion

Learned Index Structures

slide-41
SLIDE 41

A First, Naïve Learned Index

  • Use 200M web-server log records to build a secondary

index over the timestamps using Tensorflow. ○ Two-layer fully-connected NN with 32 neurons per layer using ReLU activation functions; the timestamps are the inputs and the positions are the outputs. ○ Lookup time ≈ 80,000 ns (model execution only).

  • CPU and space efficient to narrow down the position for

an item from the entire data set to a region of thousands, but inefficient for the “last mile”.

Background Results Conclusion

Learned Index Structures

slide-42
SLIDE 42

A First, Naïve Learned Index

For every key in 100M keys, we want to map it to a position in a sorted array. When we have one model, it has to be “complex enough” to figure out an accurate mapping for every key.

Background Results Conclusion

Learned Index Structures

slide-43
SLIDE 43

The Recursive Model Index

It is much easier to have a model that can say that a given key from 100M keys maps to the first 10k, second 10k, etc. positions!

Background Results Conclusion

Learned Index Structures

slide-44
SLIDE 44

The Learning Index Framework (LIF)

  • The LIF can be regarded as an index synthesis system;

given an index specification, LIF generates different index configurations, optimizes them, and tests them automatically.

  • Given a trained Tensorflow model, LIF automatically

extracts all weights from the model and generates efficient index structures in C++ based on the model specification.

Background Results Conclusion

Learned Index Structures

slide-45
SLIDE 45
  • Improve last-mile accuracy.

○ Reducing min/max error to 100 from 100M records using a single model is very hard. ○ Reducing the error to 10k from 100M is much easier to achieve even with simple models. ○ Reducing the error from 10k to 100 is simpler as the model can focus only on a subset of the data.

The Recursive Model Index

Background Results Conclusion

Learned Index Structures

slide-46
SLIDE 46
  • Improve last-mile accuracy.

○ Reducing min/max error to 100 from 100M records using a single model is very hard. ○ Reducing the error to 10k from 100M is much easier to achieve even with simple models. ○ Reducing the error from 10k to 100 is simpler as the model can focus only on a subset of the data. 💢 Use a hierarchical approach where we can have models focus on smaller subsets of data.

The Recursive Model Index

Background Results Conclusion

Learned Index Structures

slide-47
SLIDE 47

The Recursive Model Index

Take a layered approach and have models focus on limited layers:

Background Results Conclusion

Learned Index Structures

slide-48
SLIDE 48

The Recursive Model Index

Take a layered approach and have models focus on limited layers:

Reduce from 100M to 1M Reduce from 1M to 10k Reduce from 10k to 100

Background Results Conclusion

Learned Index Structures

slide-49
SLIDE 49

The Recursive Model Index

Take a layered approach and have models focus on limited layers:

Check out the math in the paper if you’re interested in the details! :)

Reduce from 100M to 1M Reduce from 1M to 10k Reduce from 10k to 100

Background Results Conclusion

Learned Index Structures

slide-50
SLIDE 50

Hybrid End-to-End Training

With a layered approach we can build mixtures of models!

Reduce from 100M to 1M Reduce from 1M to 10k Reduce from 10k to 100

small ReLU NN

Linear Regression Linear Regression Linear Regression

B-tree B-tree B-tree B-tree

Background Results Conclusion

Learned Index Structures

slide-51
SLIDE 51

Starting from the entire dataset (line 3), it trains first the top-node model. Based on the prediction of this model, it then picks the model from the next stage (lines 9 and 10) and adds all keys which fall into that model (line 10). Finally, in the case of hybrid indexes, the index is optimized by replacing NN models with B-trees if absolute min-/max-error is above a predefined threshold (lines 11-14).

Hybrid End-to-End Training

Background Results Conclusion

Learned Index Structures

slide-52
SLIDE 52

Hybrid End-to-End Training

Worst case is a B-tree!

Background Results Conclusion

Learned Index Structures

slide-53
SLIDE 53

To find the record either binary search or scanning is used. Models might generate more information than page location.

  • Model Binary Search

○ Set first middle point to pos predicted by the model.

  • Biased Search

○ Use standard deviation σ of the last stage model to set middle.

  • Biased Quaternary Search

○ Pick three middle points as pos − σ, pos, pos + σ.

Search Strategies

Background Results Conclusion

Learned Index Structures

slide-54
SLIDE 54
  • Turn strings into inputs the NN model can use.

○ Represent string as a vector, where each element is the decimal ASCII value of a char. ○ Limit size of vector to N to have equally-sized inputs

  • Vector inputs slow the model down significantly.
  • Further research is needed to speed this case up :)

Indexing strings

Background Results Conclusion

Learned Index Structures

slide-55
SLIDE 55

Inserts and Updates

  • Appends

○ No need to relearn if the model can only learn the key trend for the new items.

  • Inserts in the middle

○ If inserts follow roughly a similar pattern as the learned CDF, retraining is not needed since the index “generalizes” over the new items and inserts become an O(1) operation.

Background Results Conclusion

Learned Index Structures

slide-56
SLIDE 56

If we have a model that is more general, it is cheaper to insert new values, since they will follow the trend.

Inserts and Updates

Background Results Conclusion

Learned Index Structures

slide-57
SLIDE 57

Hashmap

Hashmaps use a hash function to deterministically map keys to random positions inside an array.

Background Results Conclusion

Learned Index Structures

slide-58
SLIDE 58

Hashmap

Main challenge is to reduce conflicts.

  • Use a linked-list to handle the “overflow”.
  • Use linear or quadratic probing.
  • Most solutions allocate significantly more memory than

records and combine it with additional data structures. ○ Dense hashmap: typical overhead of 78% memory. ○ Sparse hashmap: only 4 bits overhead, but is up to 3-7 times slower because of its search and data placement strategy.

Background Results Conclusion

Learned Index Structures

slide-59
SLIDE 59
  • If we could learn a model which uniquely maps every key

into a unique position inside the array we could avoid conflicts.

  • Learned models are capable of reaching higher utilization
  • f the hashmap depending on the data distribution.
  • Scale the distribution by the targeted size M of the

hashmap and use h(K) = F(K) ∗ M, K is hash function.

  • If the model F perfectly learned the distribution, no

conflicts would exist.

Hashmap

Background Results Conclusion

Learned Index Structures

slide-60
SLIDE 60

Bloom filter

Bloom filters are probabilistic data structures used to test whether an element is a member of a set.

Bloom filter insertion Learned bloom filter insertion

Background Results Conclusion

Learned Index Structures

slide-61
SLIDE 61

Bloom filter

  • A bloom filter index needs to learn a function that

separates keys from everything else. ○ A good hash function for a bloom filter should have lots of collisions among keys and lots of collisions among non-keys, but few collisions of keys and non-keys.

  • As a classification problem: learn a model f that can

predict if an input x is a key or non-key.

Background Results Conclusion

Learned Index Structures

slide-62
SLIDE 62
  • As a classification problem: learn a model f that can

predict if an input x is a key or non-key. ○ Use sigmoid neurons to find probability between 0,1. ○ The output of NN is the probability that input x is a key in our database. ○ Choose a threshold t above which we will assume the key exists in our database. ○ Tune threshold t to achieve the desired false positive rate. ○ To prevent false negatives, use overflow bloom filter.

Bloom filter

Background Results Conclusion

Learned Index Structures

slide-63
SLIDE 63

Results

slide-64
SLIDE 64
  • 4 datasets to compare the performance of learned index

structures with B-trees.

○ Compare lookup-time (model execution time + local search time). ○ Compare index structure size. ○ Compare model error and error variance.

  • These results focus on read performance only, loading

and insertion time are not included.

○ A model without hidden layers can be trained on over 200M records in just few seconds.

B-tree Results

Background Results Conclusion

Learned Index Structures

slide-65
SLIDE 65

200M log entries for requests to a major university website. Index over all unique timestamps.

Web Log Dataset

Background Results Conclusion

Learned Index Structures

slide-66
SLIDE 66

200M log entries for requests to a major university website. Index over all unique timestamps.

Web Log Dataset

The model error is the averaged standard error

  • ver all models on the

last stage, whereas the error variance indicates how much this standard error varies between the models.

Baseline Background Results Conclusion

Learned Index Structures

slide-67
SLIDE 67

Model is 3× faster and up to an order-of-magnitude smaller.

Web Log Dataset

Background Results Conclusion

Learned Index Structures

slide-68
SLIDE 68

Quarternary search only helps a little bit.

Web Log Dataset

Background Results Conclusion

Learned Index Structures

slide-69
SLIDE 69

The error is high, which influences the search time.

Web Log Dataset

Background Results Conclusion

Learned Index Structures

slide-70
SLIDE 70

Maps Dataset

Index of the longitude of ≈ 200M user-maintained features across the world. Relatively linear.

Background Results Conclusion

Learned Index Structures

slide-71
SLIDE 71

Maps Dataset

Background Results Conclusion

Learned Index Structures

Model is 3× faster and up to an order-of-magnitude smaller.

slide-72
SLIDE 72

Maps Dataset

Quarternary search does not help.

Background Results Conclusion

Learned Index Structures

slide-73
SLIDE 73

Lognormal Dataset

Synthetic dataset of 190M unique values to test how the index works on heavy-tail distributions. Highly non-linear, making the distribution more difficult to learn.

Background Results Conclusion

Learned Index Structures

slide-74
SLIDE 74

Lognormal Dataset

Background Results Conclusion

Learned Index Structures

The error is high, which influences the search time.

slide-75
SLIDE 75

Important Observations

  • 3× faster and being up to an order-of-magnitude smaller.
  • Quarternary search only helps for some datasets.
  • The model accuracy varies widely. Most noticeable for the

synthetic dataset and the weblog data the error is much higher.

  • Second stage size has a significant impact on the index

size and lookup performance. ○ This is not surprising as the second stage determines how many models have to be stored. Worth noting is that our second stage uses 10,000 or more models.

Background Results Conclusion

Learned Index Structures

slide-76
SLIDE 76

Web Document Dataset

The web-document dataset consists of the 10M non-continuous document-ids of a large web index used as part of a real product at a large internet company.

Background Results Conclusion

Learned Index Structures

slide-77
SLIDE 77

Web Document Dataset

Speedups for learned indexes is not as prominent, so hybrid indexes, which replace bad performing models with B-trees actually help to improve performance.

Background Results Conclusion

Learned Index Structures

slide-78
SLIDE 78

Web Document Dataset

Because cost of searching is higher, the different search strategies make a bigger difference. The reason why biased search and quaternary search performs better is that they can take the standard error into account.

Background Results Conclusion

Learned Index Structures

slide-79
SLIDE 79
  • Use 3 int datasets.
  • Model hash has similar

performance and utilizes the memory better.

  • When there are extra

slots, the improvement disappears.

Hashmap Results

Background Results Conclusion

Learned Index Structures

slide-80
SLIDE 80
  • Blacklisted phishing URLs

dataset: 1.7M unique URLs.

  • The more accurate the

model is, the better the savings in bloom filter size.

Bloom filter Results

Background Results Conclusion

Learned Index Structures

slide-81
SLIDE 81
  • A normal Bloom filter with

a desired 1% false positive rate requires 2.04MB.

  • For a 16-dim GRU with a

32-dim embedding for each character; the model is 0.0259MB, with the spillover it is 1.07 MB.

Bloom filter Results

Background Results Conclusion

Learned Index Structures

slide-82
SLIDE 82

Conclusion

slide-83
SLIDE 83
  • Multi-Dimensional Indexes: Extend learned indexes to

multi-dimensional index structures. Models, especially neural nets, are extremely good at capturing complex high-dimensional relationships.

  • Learned Algorithms: A model can also speed-up

sorting and joins, not just indexes.

  • GPU/TPUs: GPU/TPUs will make the idea of learned

indexes even more viable.

Conclusion and Future Work

Background Results Conclusion

Learned Index Structures

slide-84
SLIDE 84

Next time

  • Is this a good idea?
  • Related work
  • Some Notes on "Learned Bloom Filters"
  • Don't Throw Out Your Algorithms Book Just Yet

Background Results Conclusion

Learned Index Structures