Brie: A Specialized Trie for Concurrent Datalog Jordan 1 , - - PowerPoint PPT Presentation

brie a specialized trie for concurrent datalog
SMART_READER_LITE
LIVE PREVIEW

Brie: A Specialized Trie for Concurrent Datalog Jordan 1 , - - PowerPoint PPT Presentation

Brie: A Specialized Trie for Concurrent Datalog Jordan 1 , Pavle Suboti 3 , Herbert He David Zhao 2 , and Bernhard Scholz 2 PMAM 2019, 17 February 2019, Washington, DC 1) University of Innsbruck 2) University of Sydney 3)


slide-1
SLIDE 1

He Herbert Jordan1, Pavle Subotić3, David Zhao2, and Bernhard Scholz2

PMAM 2019, 17 February 2019, Washington, DC

Brie: A Specialized Trie for Concurrent Datalog

1) University

  • f Innsbruck

2) University

  • f Sydney

3) Amazon

slide-2
SLIDE 2

Datalog (by Example)

2

a b c d e g f

from to a b a c b f c e d a d c … …

edge relation graph

Are there cycles?

slide-3
SLIDE 3

Datalog (by Example)

3

a b c d e g f

from to a b a c b f c e d a d c … …

edge relation graph

Is the graph connected?

slide-4
SLIDE 4

Datalog (by Example)

4

a b c d e g f

from to a b a c b f c e d a d c … …

edge relation graph

Which nodes are connected?

slide-5
SLIDE 5

Datalog (by Example)

5

path(X,Y) :- edge(X,Y). path(X,Z) :- path(X,Y), edge(Y,Z). a b c d e g f

from to a b a c b f c e d a d c … …

graph Da Datalog query edge relation

slide-6
SLIDE 6

Datalog

› Benefits:

– a concise formalism for powerful data analysis – lately major performance improvements and tool support

› Applications:

– data base queries – program analysis – security vulnerability analysis – network analysis

6

100s of relations and rules, billions of tuples, all in-memory

slide-7
SLIDE 7

Query Processing

7

relations set of integer tuples rules sequence

  • f

relational algebra

  • perations
  • n

sets

slide-8
SLIDE 8

Example

8

path(X,Z) :- path(X,Y), edge(Y,Z).

!"#ℎ ← !"#ℎ ∪ '() '() ← *(,(-#" ⋈ (,/() ∖ !"#ℎ ,(-#" ← !"#ℎ ,(-#" ← '() while ( ,(-#" ≠ ∅ ) { }

computational expensive and dominating part

slide-9
SLIDE 9

Needed

› efficient data structure for relations

– maintain set of n-dimensional tuples – efficient support for

› insertion, › scans, › range queries, › membership tests, › emptiness checks

– efficient synchronization

  • f

concurrent inserts

9

well supported by B-tr trees challenging

slide-10
SLIDE 10

B-tree Issues

› Concurrent inserts:

– require sophisticated locking scheme – while holding locks, costly

  • perations

are performed

› binary search

  • perations,

and inserts in sorted arrays

10

(9,4) (9,2) (8,7) (1,1) (1,2) (3,2) (4,7) (6,9) (7,4) (5,3) (8,2)

slide-11
SLIDE 11

Brie

11

slide-12
SLIDE 12

Brie – Inner Node

12

slide-13
SLIDE 13

Brie – Leaf Node

13

slide-14
SLIDE 14

Synchronizing Inserts

› Insertion

  • 1. navigate

down the tree

› insert sub-trees

  • n

demand using CAS

  • 2. If

inner node tree needs to grow

› introduce new root node using CAS

  • 3. add

1-bit to leaf level mask

› using atomic bitwise

  • r

14

slide-15
SLIDE 15

Data Density

Performance is density dependent:

15

(0,3) (3,1) (7,2) 3 7 (3,1) (3,3) (3,4) 3 low density high density

Density: ratio

  • f

included points

  • vs. spanned

interval

slide-16
SLIDE 16

Memory Usage

16 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 90 100

memory [GB] elements in million

btree brie 100% brie 10% brie 5% brie 2% brie 1% brie 0.5% brie 0.1%

slide-17
SLIDE 17

Sequential Performance

17 20 40 60 80 100 1000² 2000² 5000² 10000²

million insertions/s total elements inserted

std::set std::hash_set concurrent btree brie 0.1% brie 1% brie 100% 10 20 30 40 50 1000² 2000² 5000² 10000²

million insertions/s total elements inserted

std::set std::hash_set concurrent btree brie 0.1% brie 1% brie 100%

  • rdered

insertion random

  • rder

insertion

slide-18
SLIDE 18

Sequential Performance (2)

18 20 40 60 80 100 1000² 2000² 5000² 10000²

million queries/s elements in set and number

  • f

queries

std::set std::hash_set concurrent btree brie 0.1% brie 1% brie 100% 200 400 1000² 2000² 5000² 10000²

million entries/s elements in set

std::set std::hash_set concurrent btree brie 0.1% brie 1% brie 100%

membership test (random

  • rder)

full range scan

slide-19
SLIDE 19

Parallel Performance

19 0.1 1 10 100 1000 10000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

million insertions/s number

  • f

threads

tbb::hash_set concurrent btree brie 0.1% brie 1% brie 100% 0.1 1 10 100 1000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

million insertions/s number

  • f

threads

tbb::hash_set concurrent btree brie 0.1% brie 1% brie 100%

  • rdered

insertion random

  • rder

insertion 4x8 core Intel Xeon E5-4650

slide-20
SLIDE 20

Parallel Performance

20 500 1000 1500 2000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

million insertions/s number

  • f

threads

tbb::hash_set concurrent btree brie 0.1% brie 1% brie 100% 50 100 150 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

million insertions/s number

  • f

threads

tbb::hash_set concurrent btree brie 0.1% brie 1% brie 100%

  • rdered

insertion random

  • rder

insertion 4x8 core Intel Xeon E5-4650 up to 11x faster than B-trees up to 15x faster than B-trees

slide-21
SLIDE 21

Datalog Query Processing

100 200 300 400 500 600 700 800 900 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 total query time [s] number

  • f

threads btree brie mixed 21 2 4 6 8 10 12 14 16 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Memory usage [GB] number

  • f

threads btree brie mixed

context sensitive var-points-to analysis ~4x faster

  • 50%

memory

slide-22
SLIDE 22

Conclusion

› Developed concurrent set for Datalog relations:

– Trie derived structure + blocked nodes

› enables fast relational

  • perations

– Low

  • verhead

synchronization

› atomic

  • peration based

synchronization sufficient

› Results:

– up to 5-17 17x faster for sequential insert and query

  • perations

– up to 15 15x faster for parallel insertion

  • perations

– up to 4x 4x faster and 50% 50% less memory for real-world qu quer ery proces essing

› Future work:

– investigate

  • ther

data structures for specialized use cases

22

slide-23
SLIDE 23

Thank you!

visit us on https://souffle-lang.github.io sources: https://github.com/souffle- lang/souffle/blob/master/src/Brie.h

23

slide-24
SLIDE 24

Parallel Performance

27 0.1 1 10 100 1000 10000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

million insertions/s number

  • f

threads

tbb::hash_set concurrent btree brie 0.1% brie 1% brie 100% reduction btree 0.1 1 10 100 1000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

million insertions/s number

  • f

threads

tbb::hash_set concurrent btree brie 0.1% brie 1% brie 100% reduction btree

  • rdered

insertion random

  • rder

insertion 4x8 core Intel Xeon E5-4650

slide-25
SLIDE 25

Parallel Performance

28 500 1000 1500 2000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

million insertions/s number

  • f

threads

tbb::hash_set concurrent btree brie 0.1% brie 1% brie 100% reduction btree 50 100 150 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

million insertions/s number

  • f

threads

tbb::hash_set concurrent btree brie 0.1% brie 1% brie 100% reduction btree

  • rdered

insertion random

  • rder

insertion 4x8 core Intel Xeon E5-4650 up to 11x faster than B-trees up to 15x faster than B-trees

slide-26
SLIDE 26

Example

29

path(X,Z) :- path(X,Y), edge(Y,Z).

!"#ℎ ← !"#ℎ ∪ '() '() ← *(,(-#" ⋈ (,/() ∖ !"#ℎ ,(-#" ← !"#ℎ ,(-#" ← '() while ( ,(-#" ≠ ∅ ) { }