Flavors Library for Fast Parallel Lookup Using Custom Radix Trees - - PowerPoint PPT Presentation

flavors library for fast parallel lookup using custom
SMART_READER_LITE
LIVE PREVIEW

Flavors Library for Fast Parallel Lookup Using Custom Radix Trees - - PowerPoint PPT Presentation

Flavors Library for Fast Parallel Lookup Using Custom Radix Trees Presentation ID 23269 Albert Wolant Warsaw University of Technology GTC Europe 11 October 2017 1 Acknowledgements Krzysztof Kaczmarski, Ph.D - my Ph.D thesis advisor


slide-1
SLIDE 1

Flavors Library for Fast Parallel Lookup Using Custom Radix Trees

Presentation ID 23269

Albert Wolant Warsaw University of Technology GTC Europe 11 October 2017

1

slide-2
SLIDE 2

Acknowledgements

  • Krzysztof Kaczmarski, Ph.D - my Ph.D thesis advisor
  • Faculty of Mathematics and Information Science

2

slide-3
SLIDE 3

What I will talk about?

  • What is Flavors and how it came to be?
  • Overview of algorithms.
  • Experimental results and benchmarks.
  • Customization using machine learning

3

slide-4
SLIDE 4

From where it came?

Presentation on GTC 2016 on GTC On-Demand platform:

4

slide-5
SLIDE 5

What it can do?

  • Flavors provides algorithm to build and search radix-tree with configurable bit

stride on the GPU.

  • Values can be of constant length (keys) or can vary in length (masks).
  • Search can be done to find values exactly or perform longest-prefix matching.

5

slide-6
SLIDE 6

Tree for keys

Example of tree for bit strides sequence {3, 2, 1}.

6

slide-7
SLIDE 7

Tree for keys - searching example

Example search for key 010 − 01 − 1

7

slide-8
SLIDE 8

Tree for keys - searching example

Example search for key 010 − 01 − 1

8

slide-9
SLIDE 9

Tree for keys - searching example

Example search for key 010 − 01 − 1

9

slide-10
SLIDE 10

Tree for keys - searching example

Example search for key 010 − 01 − 1

10

slide-11
SLIDE 11

Tree for keys - searching example

Example search for key 010 − 01 − 1

11

slide-12
SLIDE 12

Tree for keys - searching example

Example search for key 010 − 01 − 1

12

slide-13
SLIDE 13

Tree for keys - searching example

Example search for key 010 − 01 − 1

13

slide-14
SLIDE 14

Tree for keys - searching example

Example search for key 010 − 01 − 1

14

slide-15
SLIDE 15

Tree for keys - searching example

Example search for key 010 − 01 − 1

15

slide-16
SLIDE 16

Tree for keys - searching example

Example search for key 010 − 01 − 1

16

slide-17
SLIDE 17

Tree for keys - node structure

In practice, cells hold indexes of nodes on next level instead of pointers. Last level keeps original indexes of keys.

17

slide-18
SLIDE 18

Tree for keys - node structure

In practice, cells hold indexes of nodes on next level instead of pointers. Last level keeps original indexes of keys.

18

slide-19
SLIDE 19

Tree construction - input data

19

slide-20
SLIDE 20

Tree construction - data sorting

20

slide-21
SLIDE 21

Tree construction - values

21

slide-22
SLIDE 22

Tree construction - nodes borders

22

slide-23
SLIDE 23

Tree construction - nodes borders

23

slide-24
SLIDE 24

Tree construction - nodes borders

24

slide-25
SLIDE 25

Tree construction - nodes borders

25

slide-26
SLIDE 26

Tree construction - nodes borders

26

slide-27
SLIDE 27

Tree construction - nodes indexes

27

slide-28
SLIDE 28

Tree construction - nodes indexes

28

slide-29
SLIDE 29

Tree construction - nodes indexes

29

slide-30
SLIDE 30

Tree construction - nodes allocation

30

slide-31
SLIDE 31

Tree construction - nodes allocation

Last row of nodesIndexes array has node counts for each level. Since size of node on level is known (based on bit stride), memory for all nodes can be allocated.

31

slide-32
SLIDE 32

Tree construction - nodes allocation

Last row of nodesIndexes array has node counts for each level. Since size of node on level is known (based on bit stride), memory for all nodes can be allocated. Values in arrays above can also be used to link nodes between levels.

32

slide-33
SLIDE 33

Tree construction - linking nodes

Let V be values array, B be nodesBorders, and N be nodesIndexes. Let’s consider one cell of this arrays, in row ′key′ and column ′level′.

33

slide-34
SLIDE 34

Tree construction - linking nodes

Let V be values array, B be nodesBorders, and N be nodesIndexes. Let’s consider one cell of this arrays, in row ′key′ and column ′level′. Let v = V [key][level] and n = N[key][level] and b = B[key][level].

34

slide-35
SLIDE 35

Tree construction - linking nodes

Let V be values array, B be nodesBorders, and N be nodesIndexes. Let’s consider one cell of this arrays, in row ′key′ and column ′level′. Let v = V [key][level] and n = N[key][level] and b = B[key][level]. Let C be 2D array pointing to all of the nodes (for example C[1][2] points to the beginning of second node on first level, since indexing is done from 1).

35

slide-36
SLIDE 36

Tree construction - linking nodes

Let V be values array, B be nodesBorders, and N be nodesIndexes. Let’s consider one cell of this arrays, in row ′key′ and column ′level′. Let v = V [key][level] and n = N[key][level] and b = B[key][level]. Let C be 2D array pointing to all of the nodes (for example C[1][2] points to the beginning of second node on first level, since indexing is done from 1). Then: C[level][n] + v ← − N[key][level + 1]

36

slide-37
SLIDE 37

Tree construction - linking nodes

Let V be values array, B be nodesBorders, and N be nodesIndexes. Let’s consider one cell of this arrays, in row ′key′ and column ′level′. Let v = V [key][level] and n = N[key][level] and b = B[key][level]. Let C be 2D array pointing to all of the nodes (for example C[1][2] points to the beginning of second node on first level, since indexing is done from 1). Then: C[level][n] + v ← − N[key][level + 1] To avoid multiple writes to the same cell, above is done only, if b is equal to 1.

37

slide-38
SLIDE 38

Tree construction - linking nodes

On last level, we do: C[level][n] + v ← − P[key] where P is permutation containing original indexes of keys.

38

slide-39
SLIDE 39

Tree construction - linking nodes

On last level, we do: C[level][n] + v ← − P[key] where P is permutation containing original indexes of keys. This operation is done for every key.

39

slide-40
SLIDE 40

Tree construction - linking nodes example

40

slide-41
SLIDE 41

Tree construction - linking nodes example

level = 2, key = 8

41

slide-42
SLIDE 42

Tree construction - linking nodes example

level = 2, key = 8, v = 0, n = 4, b = 1

42

slide-43
SLIDE 43

Tree construction - linking nodes example

level = 2, key = 8, v = 0, n = 4, b = 1 C[level][n] + v = C[2][4]

43

slide-44
SLIDE 44

Tree construction - linking nodes example

level = 2, key = 8, v = 0, n = 4, b = 1 C[level][n] + v = C[2][4]

44

slide-45
SLIDE 45

Tree construction - what about masks?

45

slide-46
SLIDE 46

Tree construction - what about masks?

46

slide-47
SLIDE 47

Tree construction - removing empty nodes

Some of the nodes are no longer needed, because masks that would occupy them were shorter.

47

slide-48
SLIDE 48

Tree construction - removing empty nodes

Some of the nodes are no longer needed, because masks that would occupy them were shorter. How to allocate memory and link nodes together?

48

slide-49
SLIDE 49

Tree construction - removing empty nodes

49

slide-50
SLIDE 50

Tree construction - removing empty nodes

After calculating nodesIndexes array, cells are cleared (values set to 0), if mask does not reach level.

50

slide-51
SLIDE 51

Tree construction - removing empty nodes

Then ′1′ in nodesBorders representing no longer needed nodes are cleared.

51

slide-52
SLIDE 52

Tree construction - removing empty nodes

After that, nodesIndexes can be recalculated and nodes allocated and linked exactly as before.

52

slide-53
SLIDE 53

Tree construction - containers

For bit stride {3, 2, 1}, masks: 010 − 0X − X 010 − 00 − X land in the same place in the tree.

53

slide-54
SLIDE 54

Tree construction - containers

For bit stride {3, 2, 1}, masks: 010 − 0X − X 010 − 00 − X land in the same place in the tree. Solution is attaching containers for masks to each node.

54

slide-55
SLIDE 55

Tree construction - containers

For bit stride {3, 2, 1}, masks: 010 − 0X − X 010 − 00 − X land in the same place in the tree. Solution is attaching containers for masks to each node. Since masks are kept in this containers, last tree level, holding original indexes, is no longer needed. On this level we only need containers.

55

slide-56
SLIDE 56

Tree construction - containers

56

slide-57
SLIDE 57

Tree construction - containers

57

slide-58
SLIDE 58

Tree construction - containers

Current implementation uses simple lists kept in single array and each of them is sorted by masks length.

58

slide-59
SLIDE 59

Tree construction - building lists

Lists are build in few steps:

59

slide-60
SLIDE 60

Tree construction - building lists

Lists are build in few steps:

  • 1. Masks are marked with index of node to which they belong (nodesIndexes on

mask level, 0 otherwise).

60

slide-61
SLIDE 61

Tree construction - building lists

Lists are build in few steps:

  • 1. Masks are marked with index of node to which they belong (nodesIndexes on

mask level, 0 otherwise).

  • 2. Lists lengths are calculated, using reduce by key operation.

61

slide-62
SLIDE 62

Tree construction - building lists

Lists are build in few steps:

  • 1. Masks are marked with index of node to which they belong (nodesIndexes on

mask level, 0 otherwise).

  • 2. Lists lengths are calculated, using reduce by key operation.
  • 3. All masks belong to some list, so memory for all lists is allocated. Only list start

and list length is kept with the node (in yellow rectangle).

62

slide-63
SLIDE 63

Tree construction - building lists

Lists are build in few steps:

  • 1. Masks are marked with index of node to which they belong (nodesIndexes on

mask level, 0 otherwise).

  • 2. Lists lengths are calculated, using reduce by key operation.
  • 3. All masks belong to some list, so memory for all lists is allocated. Only list start

and list length is kept with the node (in yellow rectangle).

  • 4. Lists starts are calculated by performing exclusive scan operation.

63

slide-64
SLIDE 64

Tree construction - building lists

Lists are build in few steps:

  • 1. Masks are marked with index of node to which they belong (nodesIndexes on

mask level, 0 otherwise).

  • 2. Lists lengths are calculated, using reduce by key operation.
  • 3. All masks belong to some list, so memory for all lists is allocated. Only list start

and list length is kept with the node (in yellow rectangle).

  • 4. Lists starts are calculated by performing exclusive scan operation.
  • 5. For every mask, special code is generated, based on their level, node and length.

64

slide-65
SLIDE 65

Tree construction - building lists

Lists are build in few steps:

  • 1. Masks are marked with index of node to which they belong (nodesIndexes on

mask level, 0 otherwise).

  • 2. Lists lengths are calculated, using reduce by key operation.
  • 3. All masks belong to some list, so memory for all lists is allocated. Only list start

and list length is kept with the node (in yellow rectangle).

  • 4. Lists starts are calculated by performing exclusive scan operation.
  • 5. For every mask, special code is generated, based on their level, node and length.
  • 6. Masks original indexes are sorted by this codes. This ensures them being in the

right spot on the right list.

65

slide-66
SLIDE 66

Tree construction - containers

66

slide-67
SLIDE 67

Experimental results - test platform

All presented tests were performed on regular PC workstation:

  • GTX 1080
  • Intel 4790K CPU
  • 16 GB of memory
  • Windows 10
  • CUDA 8

67

slide-68
SLIDE 68

Experimental results - tree build throughput for keys

Figure 1: Tree build throughput in keys per second depending on key length

68

slide-69
SLIDE 69

Experimental results - tree find keys throughput

Figure 2: Tree find keys throughput in keys per second depending on key length

69

slide-70
SLIDE 70

Experimental results - tree build from keys throughput

Figure 3: Build throughput in keys per second (left avg, right max) for different bit strides and batch sizes.

70

slide-71
SLIDE 71

Experimental results - tree find keys throughput

Figure 4: Find throughput in keys per second (left avg, right max) for different bit strides and batch sizes.

71

slide-72
SLIDE 72

Experimental results - tree build throughput for masks

Figure 5: Tree build throughput in masks per second depending on mask length

72

slide-73
SLIDE 73

Experimental results - tree find mask throughput

Figure 6: Tree find masks throughput in masks per second depending on mask length

73

slide-74
SLIDE 74

Experimental results - tree build from masks throughput

Figure 7: Build throughput in masks per second (left avg, right max) for different bit strides and batch sizes.

74

slide-75
SLIDE 75

Experimental results - tree find masks throughput

Figure 8: Find throughput in masks per second (left avg, right max) for different bit strides and batch sizes.

75

slide-76
SLIDE 76

Experimental results - tree build from IP masks throughput

Figure 9: Tree build throughput in masks per second depending on bit stride for IP masks.

76

slide-77
SLIDE 77

Experimental results - tree match IP throughput

Figure 10: Tree IP matching throughput depending on batch size for different bit strides.

77

slide-78
SLIDE 78

Experimental results - STS benchmark

Simple benchmark for databases.

  • inserting records to empty database and then reading them all
  • in presented instance 1 mln records
  • 19 digit keys, random values
  • few other fields (2 ints, 2 floats, 2 dates)

Can be found on: https://github.com/STSSoft/DatabaseBenchmark All results where taken from STS benchmark website.

78

slide-79
SLIDE 79

Experimental results - STS benchmark

Figure 11: Insert throughput in keys per second for STS benchmark

79

slide-80
SLIDE 80

Experimental results - STS benchmark

Figure 12: Find throughput in keys per second for STS benchmark

80

slide-81
SLIDE 81

Experimental results - STS benchmark

Figure 13: Benchmark times in miliseconds for different databases and Flavors

81

slide-82
SLIDE 82

Why would you use it?

82

slide-83
SLIDE 83

Why would you use it?

  • For certain cases it is fast

83

slide-84
SLIDE 84

Why would you use it?

  • For certain cases it is fast
  • Beyond some point it scales reasonably for long keys

84

slide-85
SLIDE 85

Why would you use it?

  • For certain cases it is fast
  • Beyond some point it scales reasonably for long keys
  • Trees embody neighboring, unlike hash tables

85

slide-86
SLIDE 86

Ongoing work

Obvious problem is, how to pick bit strides?

86

slide-87
SLIDE 87

Ongoing work

Obvious problem is, how to pick bit strides? There is algorithm to calculate bit strides, but:

  • it is slow (dynamic programming)
  • it aims to build tree with the smallest possible number of nodes

87

slide-88
SLIDE 88

Ongoing work

Obvious problem is, how to pick bit strides? There is algorithm to calculate bit strides, but:

  • it is slow (dynamic programming)
  • it aims to build tree with the smallest possible number of nodes

Problem is important for two reasons:

  • performance
  • ease of use

88

slide-89
SLIDE 89

Ongoing work

Possible solution - machine learning Aim would be to build a model, that could predict best possible configuration, based

  • n data.

89

slide-90
SLIDE 90

Ongoing work

Possible solution - machine learning Aim would be to build a model, that could predict best possible configuration, based

  • n data.

For now, I was able to reach about 80% accuracy using simple MLP network for predefined set of configurations.

90

slide-91
SLIDE 91

Ongoing work

Possible solution - machine learning Aim would be to build a model, that could predict best possible configuration, based

  • n data.

For now, I was able to reach about 80% accuracy using simple MLP network for predefined set of configurations. Working on model, that would generate configuration itself, based on the data.

91

slide-92
SLIDE 92

You can find Flavors here: https://github.com/wazka/flavors You can reach me by: albertwolant@gmail.com @wazka3133

92

slide-93
SLIDE 93

Thank you! Q & A

93