Graph Databases for Polyglot Persistence with NotaQL 2017-03-08 - - PowerPoint PPT Presentation

graph databases
SMART_READER_LITE
LIVE PREVIEW

Graph Databases for Polyglot Persistence with NotaQL 2017-03-08 - - PowerPoint PPT Presentation

Transformations on Graph Databases for Polyglot Persistence with NotaQL 2017-03-08 Johannes Schildgen schildgen@cs.uni-kl.de Yannick Krck Stefan Deloch Polyglot Persistence 3 Polyglot Persistence db.product.insert ({})


slide-1
SLIDE 1

Johannes Schildgen

2017-03-08 schildgen@cs.uni-kl.de

Transformations on

Graph Databases

for

with NotaQL

Yannick Krück Stefan Deßloch

Polyglot Persistence

slide-2
SLIDE 2

3

Polyglot Persistence

slide-3
SLIDE 3

4

Polyglot Persistence

db.product.insert({…}) db.category.find()

slide-4
SLIDE 4

5

Polyglot Persistence

db.product.insert({…}) db.category.find() INCR dvd_174_cnt

Data Transformation

OUT._id <- IN._k.split(‘_‘)[0], OUT.clicks <- SUM(IN._v) OUT._id <- IN._k.split(‘_‘)[0], OUT.clicks <- SUM(IN._v)

slide-5
SLIDE 5

6

slide-6
SLIDE 6

8

Graph

slide-7
SLIDE 7

9

𝐻 = (𝑊, 𝐹)

slide-8
SLIDE 8

10

𝑤1 𝑤2

slide-9
SLIDE 9

11

𝑤1 𝑤2

slide-10
SLIDE 10

12

𝑤1 𝑤2

slide-11
SLIDE 11

13

𝑤1 𝑤2

0.2 0.6

slide-12
SLIDE 12

14

Property Graphs

𝑤1 𝑤2

firstname:Kai, nastname:Li firstname:Ute, lastname:Li

Friend

since:2015-11-11

slide-13
SLIDE 13

15

1 1 1 1 1 1 1 1

slide-14
SLIDE 14

16

𝑤1: 𝑤2, 𝑤5 𝑤2: 𝑤1, 𝑤2, 𝑤3 𝑤3: 𝑤3, 𝑤4 𝑤4: 𝑤1 𝑤5: ∅

slide-15
SLIDE 15

17

vid attribute value v1 vorname Kai v1 nachname Li v2 vorname Ute v2 nachname Li v2 geboren 1985-01-01 … eid start ziel label e1 v1 v3 folgt e2 v2 v3 Bruder e3 v2 v4 folgt e4 v3 v4 folgt vid label v1 person v2 person v2 student v3 person v4 person

SELECT xV.value, xN.value FROM knoten kai, kanten, knoten xV, knoten xN, knotenlabels WHERE kai.attribute = ‘vorname’ AND kai.wert = ‘Kai’ AND kanten.start=kai.vid AND kanten.ziel = xV.vid AND xV.vid = knotenlabels.vid AND xN.vid=xV.vid AND xV.attribute=‘vorname’ AND xN.attribute=‘nachname’ AND knotenlabels.label = ‘student’ SELECT xV.value, xN.value FROM knoten kai, kanten, knoten xV, knoten xN, knotenlabels WHERE kai.attribute = ‘vorname’ AND kai.wert = ‘Kai’ AND kanten.start=kai.vid AND kanten.ziel = xV.vid AND xV.vid = knotenlabels.vid AND xN.vid=xV.vid AND xV.attribute=‘vorname’ AND xN.attribute=‘nachname’ AND knotenlabels.label = ‘student’

eid attribute value e1 seit 2015 e3 seit 2014 e4 seit 2015 e4 priorität 5

slide-16
SLIDE 16

18

vid properties v1 {vorname:“Kai“, nachname:“Li“} v2 {vorname:“Ute“, nachname:“Li“, geboren:Date(1985-01-01)} v3 … v4 … eid start ziel label properties e1 v1 v3 folgt { seit : 2015 } e2 v2 v3 Bruder { } e3 v2 v4 folgt { seit: 2014 } e4 v3 v4 folgt { seit : 2015, Priorität:5 } vid label v1 person v2 person v2 student v3 person v4 person

slide-17
SLIDE 17

19

Row-id graph properties edges v1 label person vorname nachname Kai Li folgt_v3 2015 v2 label person vorname nachname geboren Ute Li 1985-01-01 Bruder_v3 folgt_v4

  • 2015

v3 label person … … v4 label person … …

slide-18
SLIDE 18

20

{ _id:“v1“, label:“person“, vorname:“Kai“, nachname:“Li“, folgt:[{_id:“v2“, seit:2015}] } { _id:“v2“, label:[“person“, “student“], vorname:“Ute“, nachname:“Li“, geboren:1985 folgt:[{_id:“v4“, seit:2014, prioritaet:5}], Bruder:[“v3“] } ...

ergebnis = []; kai = db.personen.find( {vorname:”Kai”},{folgt:1}) while(kai.hasNext()) { p = folgt.next(); for(i in p.folgt) { id = p.folgt[i]._id; s = db.personen.findOne({_id:id, label:”student”, {vorname:1,nachname:1}) .toArray(); if(s!=null){ergebnis.concat(s);} } } ergebnis = []; kai = db.personen.find( {vorname:”Kai”},{folgt:1}) while(kai.hasNext()) { p = folgt.next(); for(i in p.folgt) { id = p.folgt[i]._id; s = db.personen.findOne({_id:id, label:”student”, {vorname:1,nachname:1}) .toArray(); if(s!=null){ergebnis.concat(s);} } }

slide-19
SLIDE 19

21

subjekt prädikat

  • bjekt

http://dbpedia.org/resource/ Krefeld_Hauptbahnhof rdf:type http://schema.org/Place http://dbpedia.org/resource/ Krefeld_Hauptbahnhof foaf:name Krefeld Hauptbahnhof http://dbpedia.org/resource/ Krefeld_Hauptbahnhof georss:point 51.325833333333335 6.569444444444445 http://dbpedia.org/resource/ Krefeld_Hauptbahnhof rdf:comment Krefeld Hauptbahnhof ist der größte Bahnhof der Stadt Krefeld. Dort … http://dbpedia.org/resource/ Krefeld_Hauptbahnhof country http://dbpedia.org/resource/Germany http://dbpedia.org/resource/ Germany foaf:name Germany

slide-20
SLIDE 20

22

  • Storage
  • Index Support (+Apache Lucene)
  • Graph Query Languages
  • ACID
  • REST API
  • Replication
slide-21
SLIDE 21

23

{ _id: 77, firstname: “Kate“, age: 38, city: „Rome“ } { _id: 19, firstname: “Jane“, age: 36, city: „Bern“ }

OUT._id <- IN._id, OUT.type <- ‘person‘, OUT.firstname <- IN.firstname, OUT.age <- IN.age, OUT.city <- IN.city OUT._id <- IN._id, OUT.type <- ‘person‘, OUT.firstname <- IN.firstname, OUT.age <- IN.age, OUT.city <- IN.city

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern

slide-22
SLIDE 22

24

{ _id: 77, firstname: “Kate“, age: 38, city: „Rome“ } { _id: 19, firstname: “Jane“, age: 36, city: „Bern“ }

OUT._id <- IN._id, OUT.type <- ‘person‘, OUT.$(IN.*.name()) <- IN.@ OUT._id <- IN._id, OUT.type <- ‘person‘, OUT.$(IN.*.name()) <- IN.@

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern

slide-23
SLIDE 23

25

{ _id: 77, firstname: “Kate“, age: 38, city: „Rome“ } { _id: 19, firstname: “Jane“, age: 36, city: „Bern“ }

IN-FILTER: type=‘person‘, OUT._id <- IN._id, OUT.$(IN.name()) <- IN.@ IN-FILTER: type=‘person‘, OUT._id <- IN._id, OUT.$(IN.name()) <- IN.@

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern

slide-24
SLIDE 24

26

Accessing & Traversing Edges

slide-25
SLIDE 25

27

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01

IN.age IN.age

37 35

slide-26
SLIDE 26

28

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01

IN._>e IN._>e

slide-27
SLIDE 27

29

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01

IN._>e.since IN._>e.since

2016-01-01

slide-28
SLIDE 28

30

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01

IN._<e.since IN._<e.since

2016-01-01

slide-29
SLIDE 29

31

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01

IN._e.since IN._e.since

2016-01-01 2016-01-01

slide-30
SLIDE 30

32

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01

IN._e_ IN._e_

type : person _id : 19 name : Jane age : 35 city : Bern type : person _id : 77 name : Kate age : 37 city : Rome

slide-31
SLIDE 31

33

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01

IN._e_.name IN._e_.name

Jane Kate

slide-32
SLIDE 32

34

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01

IN._e?(‘friend‘)_.name IN._e?(‘friend‘)_.name

Jane Kate

slide-33
SLIDE 33

35

slide-34
SLIDE 34

36

Creating Edges

slide-35
SLIDE 35

37

type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 25 name : Carl age : 57 city : Rome father

OUT._>e OUT._>e

create an edge to every persons grandmother… type : person _id : 26 name : Carla age : 77 city : Rome mother

?( ) <- EDGE( ) _id= IN._>e?(‘mother‘||‘father‘)_._>e?(‘mother‘)_._id ‘grandmother‘ , via <- IN._>e[@]._l

grandmother via: ‘father‘

slide-36
SLIDE 36

38

Iterative Computations

𝑄𝑏𝑕𝑓𝑠𝑏𝑜𝑙: 𝑄𝑆 𝑟 = ෍

𝑞∈𝑗𝑜 𝑟

𝑄𝑆(𝑞) |𝑝𝑣𝑢 𝑞 |

OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id))

slide-37
SLIDE 37

39

Iterative Computations

𝑄𝑏𝑕𝑓𝑠𝑏𝑜𝑙: 𝑄𝑆 𝑟 = ෍

𝑞∈𝑗𝑜 𝑟

𝑄𝑆(𝑞) |𝑝𝑣𝑢 𝑞 |

REPEAT: 10, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) REPEAT: 10, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id))

slide-38
SLIDE 38

40

Iterative Computations

𝑄𝑏𝑕𝑓𝑠𝑏𝑜𝑙: 𝑄𝑆 𝑟 = ෍

𝑞∈𝑗𝑜 𝑟

𝑄𝑆(𝑞) |𝑝𝑣𝑢 𝑞 |

REPEAT: 99999999, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) REPEAT: 99999999, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id))

slide-39
SLIDE 39

41

Iterative Computations

𝑄𝑏𝑕𝑓𝑠𝑏𝑜𝑙: 𝑄𝑆 𝑟 = ෍

𝑞∈𝑗𝑜 𝑟

𝑄𝑆(𝑞) |𝑝𝑣𝑢 𝑞 |

REPEAT: -1, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) REPEAT: -1, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id))

slide-40
SLIDE 40

42

Iterative Computations

𝑄𝑏𝑕𝑓𝑠𝑏𝑜𝑙: 𝑄𝑆 𝑟 = ෍

𝑞∈𝑗𝑜 𝑟

𝑄𝑆(𝑞) |𝑝𝑣𝑢 𝑞 |

REPEAT: pr(0.0005%), OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) REPEAT: pr(0.0005%), OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id))

slide-41
SLIDE 41

43

Implementation Details

slide-42
SLIDE 42

44

slide-43
SLIDE 43

45

Tinkerpop

Blueprints Generic Graph API Blueprints Generic Graph API

slide-44
SLIDE 44

46

Blueprints API

Graph graph = new Neo4jGraph("/tmp/my_graph"); for (Vertex v : graph.getVertices()) { System.out.println(v.getId()); System.out.println(v.getProperty(“vorname”)); for(Edge e : v.getEdges(OUT)) { ... } } Graph graph = new Neo4jGraph("/tmp/my_graph"); for (Vertex v : graph.getVertices()) { System.out.println(v.getId()); System.out.println(v.getProperty(“vorname”)); for(Edge e : v.getEdges(OUT)) { ... } }

slide-45
SLIDE 45

47

JSON

slide-46
SLIDE 46

48

Applications

slide-47
SLIDE 47

49

Java & Cypher 207 Min. 95 Sek. (15,000 vertices and 200,000 edges)

slide-48
SLIDE 48

50

Graph-Transformationen in MongoDB

slide-49
SLIDE 49

51

Graph-Transformationen in MongoDB

2 Min. 23 Min.

slide-50
SLIDE 50

52

Conclusions

  • NotaQL language extension for graph transformations
  • access / create properties and edges
  • iterative algorithms
  • cross-system graph transformations
  • prototype based on Blueprints and Spark