Johannes Schildgen
2017-03-08 schildgen@cs.uni-kl.de
Transformations on
Graph Databases
for
with NotaQL
Yannick Krück Stefan Deßloch
Graph Databases for Polyglot Persistence with NotaQL 2017-03-08 - - PowerPoint PPT Presentation
Transformations on Graph Databases for Polyglot Persistence with NotaQL 2017-03-08 Johannes Schildgen schildgen@cs.uni-kl.de Yannick Krck Stefan Deloch Polyglot Persistence 3 Polyglot Persistence db.product.insert ({})
Johannes Schildgen
2017-03-08 schildgen@cs.uni-kl.de
for
Yannick Krück Stefan Deßloch
3
Polyglot Persistence
4
Polyglot Persistence
db.product.insert({…}) db.category.find()
5
Polyglot Persistence
db.product.insert({…}) db.category.find() INCR dvd_174_cnt
Data Transformation
OUT._id <- IN._k.split(‘_‘)[0], OUT.clicks <- SUM(IN._v) OUT._id <- IN._k.split(‘_‘)[0], OUT.clicks <- SUM(IN._v)
6
8
9
10
𝑤1 𝑤2
11
𝑤1 𝑤2
12
𝑤1 𝑤2
13
𝑤1 𝑤2
0.2 0.6
14
Property Graphs
𝑤1 𝑤2
firstname:Kai, nastname:Li firstname:Ute, lastname:Li
Friend
since:2015-11-11
15
16
17
vid attribute value v1 vorname Kai v1 nachname Li v2 vorname Ute v2 nachname Li v2 geboren 1985-01-01 … eid start ziel label e1 v1 v3 folgt e2 v2 v3 Bruder e3 v2 v4 folgt e4 v3 v4 folgt vid label v1 person v2 person v2 student v3 person v4 person
SELECT xV.value, xN.value FROM knoten kai, kanten, knoten xV, knoten xN, knotenlabels WHERE kai.attribute = ‘vorname’ AND kai.wert = ‘Kai’ AND kanten.start=kai.vid AND kanten.ziel = xV.vid AND xV.vid = knotenlabels.vid AND xN.vid=xV.vid AND xV.attribute=‘vorname’ AND xN.attribute=‘nachname’ AND knotenlabels.label = ‘student’ SELECT xV.value, xN.value FROM knoten kai, kanten, knoten xV, knoten xN, knotenlabels WHERE kai.attribute = ‘vorname’ AND kai.wert = ‘Kai’ AND kanten.start=kai.vid AND kanten.ziel = xV.vid AND xV.vid = knotenlabels.vid AND xN.vid=xV.vid AND xV.attribute=‘vorname’ AND xN.attribute=‘nachname’ AND knotenlabels.label = ‘student’
eid attribute value e1 seit 2015 e3 seit 2014 e4 seit 2015 e4 priorität 5
18
vid properties v1 {vorname:“Kai“, nachname:“Li“} v2 {vorname:“Ute“, nachname:“Li“, geboren:Date(1985-01-01)} v3 … v4 … eid start ziel label properties e1 v1 v3 folgt { seit : 2015 } e2 v2 v3 Bruder { } e3 v2 v4 folgt { seit: 2014 } e4 v3 v4 folgt { seit : 2015, Priorität:5 } vid label v1 person v2 person v2 student v3 person v4 person
19
Row-id graph properties edges v1 label person vorname nachname Kai Li folgt_v3 2015 v2 label person vorname nachname geboren Ute Li 1985-01-01 Bruder_v3 folgt_v4
v3 label person … … v4 label person … …
20
{ _id:“v1“, label:“person“, vorname:“Kai“, nachname:“Li“, folgt:[{_id:“v2“, seit:2015}] } { _id:“v2“, label:[“person“, “student“], vorname:“Ute“, nachname:“Li“, geboren:1985 folgt:[{_id:“v4“, seit:2014, prioritaet:5}], Bruder:[“v3“] } ...
ergebnis = []; kai = db.personen.find( {vorname:”Kai”},{folgt:1}) while(kai.hasNext()) { p = folgt.next(); for(i in p.folgt) { id = p.folgt[i]._id; s = db.personen.findOne({_id:id, label:”student”, {vorname:1,nachname:1}) .toArray(); if(s!=null){ergebnis.concat(s);} } } ergebnis = []; kai = db.personen.find( {vorname:”Kai”},{folgt:1}) while(kai.hasNext()) { p = folgt.next(); for(i in p.folgt) { id = p.folgt[i]._id; s = db.personen.findOne({_id:id, label:”student”, {vorname:1,nachname:1}) .toArray(); if(s!=null){ergebnis.concat(s);} } }
21
subjekt prädikat
http://dbpedia.org/resource/ Krefeld_Hauptbahnhof rdf:type http://schema.org/Place http://dbpedia.org/resource/ Krefeld_Hauptbahnhof foaf:name Krefeld Hauptbahnhof http://dbpedia.org/resource/ Krefeld_Hauptbahnhof georss:point 51.325833333333335 6.569444444444445 http://dbpedia.org/resource/ Krefeld_Hauptbahnhof rdf:comment Krefeld Hauptbahnhof ist der größte Bahnhof der Stadt Krefeld. Dort … http://dbpedia.org/resource/ Krefeld_Hauptbahnhof country http://dbpedia.org/resource/Germany http://dbpedia.org/resource/ Germany foaf:name Germany
22
23
{ _id: 77, firstname: “Kate“, age: 38, city: „Rome“ } { _id: 19, firstname: “Jane“, age: 36, city: „Bern“ }
OUT._id <- IN._id, OUT.type <- ‘person‘, OUT.firstname <- IN.firstname, OUT.age <- IN.age, OUT.city <- IN.city OUT._id <- IN._id, OUT.type <- ‘person‘, OUT.firstname <- IN.firstname, OUT.age <- IN.age, OUT.city <- IN.city
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern
24
{ _id: 77, firstname: “Kate“, age: 38, city: „Rome“ } { _id: 19, firstname: “Jane“, age: 36, city: „Bern“ }
OUT._id <- IN._id, OUT.type <- ‘person‘, OUT.$(IN.*.name()) <- IN.@ OUT._id <- IN._id, OUT.type <- ‘person‘, OUT.$(IN.*.name()) <- IN.@
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern
25
{ _id: 77, firstname: “Kate“, age: 38, city: „Rome“ } { _id: 19, firstname: “Jane“, age: 36, city: „Bern“ }
IN-FILTER: type=‘person‘, OUT._id <- IN._id, OUT.$(IN.name()) <- IN.@ IN-FILTER: type=‘person‘, OUT._id <- IN._id, OUT.$(IN.name()) <- IN.@
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern
26
Accessing & Traversing Edges
27
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01
IN.age IN.age
37 35
28
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01
IN._>e IN._>e
29
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01
IN._>e.since IN._>e.since
2016-01-01
30
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01
IN._<e.since IN._<e.since
2016-01-01
31
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01
IN._e.since IN._e.since
2016-01-01 2016-01-01
32
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01
IN._e_ IN._e_
type : person _id : 19 name : Jane age : 35 city : Bern type : person _id : 77 name : Kate age : 37 city : Rome
33
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01
IN._e_.name IN._e_.name
Jane Kate
34
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern friend since:2016-01-01
IN._e?(‘friend‘)_.name IN._e?(‘friend‘)_.name
Jane Kate
35
36
Creating Edges
37
type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 25 name : Carl age : 57 city : Rome father
OUT._>e OUT._>e
create an edge to every persons grandmother… type : person _id : 26 name : Carla age : 77 city : Rome mother
?( ) <- EDGE( ) _id= IN._>e?(‘mother‘||‘father‘)_._>e?(‘mother‘)_._id ‘grandmother‘ , via <- IN._>e[@]._l
grandmother via: ‘father‘
38
Iterative Computations
𝑄𝑏𝑓𝑠𝑏𝑜𝑙: 𝑄𝑆 𝑟 =
𝑞∈𝑗𝑜 𝑟
𝑄𝑆(𝑞) |𝑝𝑣𝑢 𝑞 |
OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id))
39
Iterative Computations
𝑄𝑏𝑓𝑠𝑏𝑜𝑙: 𝑄𝑆 𝑟 =
𝑞∈𝑗𝑜 𝑟
𝑄𝑆(𝑞) |𝑝𝑣𝑢 𝑞 |
REPEAT: 10, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) REPEAT: 10, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id))
40
Iterative Computations
𝑄𝑏𝑓𝑠𝑏𝑜𝑙: 𝑄𝑆 𝑟 =
𝑞∈𝑗𝑜 𝑟
𝑄𝑆(𝑞) |𝑝𝑣𝑢 𝑞 |
REPEAT: 99999999, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) REPEAT: 99999999, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id))
41
Iterative Computations
𝑄𝑏𝑓𝑠𝑏𝑜𝑙: 𝑄𝑆 𝑟 =
𝑞∈𝑗𝑜 𝑟
𝑄𝑆(𝑞) |𝑝𝑣𝑢 𝑞 |
REPEAT: -1, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) REPEAT: -1, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id))
42
Iterative Computations
𝑄𝑏𝑓𝑠𝑏𝑜𝑙: 𝑄𝑆 𝑟 =
𝑞∈𝑗𝑜 𝑟
𝑄𝑆(𝑞) |𝑝𝑣𝑢 𝑞 |
REPEAT: pr(0.0005%), OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) REPEAT: pr(0.0005%), OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id))
43
Implementation Details
44
45
Tinkerpop
Blueprints Generic Graph API Blueprints Generic Graph API
46
Blueprints API
Graph graph = new Neo4jGraph("/tmp/my_graph"); for (Vertex v : graph.getVertices()) { System.out.println(v.getId()); System.out.println(v.getProperty(“vorname”)); for(Edge e : v.getEdges(OUT)) { ... } } Graph graph = new Neo4jGraph("/tmp/my_graph"); for (Vertex v : graph.getVertices()) { System.out.println(v.getId()); System.out.println(v.getProperty(“vorname”)); for(Edge e : v.getEdges(OUT)) { ... } }
47
JSON
48
Applications
49
Java & Cypher 207 Min. 95 Sek. (15,000 vertices and 200,000 edges)
50
Graph-Transformationen in MongoDB
51
Graph-Transformationen in MongoDB
2 Min. 23 Min.
52
Conclusions