Distributed Graph Storage
Veronika Molnár, UZH
Distributed Graph Storage Veronika Molnr, UZH Overview - Graphs - - PowerPoint PPT Presentation
Distributed Graph Storage Veronika Molnr, UZH Overview - Graphs and Social Networks - Criteria for Graph Processing Systems - Current Systems - Storage - Computation - Large scale systems - Comparison / Best systems - Questions
Veronika Molnár, UZH
2
1 Graph = collection of nodes + edges connecting nodes to each other Social Network = collection of individuals and social relations Social Network is also a Graph! (node = person, edge = relation)
3
Social Network graph
(image source : thenextweb.com)
2
e.g. Facebook: max 5000
4
3 Shortest Path
5
Centrality
VM BP
Betweenness Closeness PageRank Degree
4
6
information
7
E-mail connectivity graph
(image source: research.microsoft.com)
PageRank, Centrality, Shortest paths, ...
8
E-mail connectivity graph
(image source: research.microsoft.com)
1 Storage:
9
2
Apache Hive (and Hadoop)
Hadoop: Map/Reduce architecture
10
3
Titan Graph Database
11
4
Neo4j
12
13
5 Computation:
14
6
igraph
15
7
Spark GraphX
16
8
GraphLab
17
18
9 Used by Facebook/Google:
19
10
Pregel/Pregelix
20
11
Apache Giraph
21
22
Focus Scalability SNA Extensibility Used for Hive parallel computations any size no Java generic Titan storage ~100 B no Python, Java graph queries Neo4j transactional DB ~1 B yes Java, Python, R recommender systems igraph efficiency, portability ~1 M yes R, Python, C++ research GraphX parallel computations ~1 B yes Java, Python, R graph processing GraphLab processing, analytics ~1 B yes C++ recommender systems Giraph large scale, BSP any size no Java, Python Facebook Pregel(ix) large scale, BSP any size yes Java Google
Depends on the network and intended use..
23
24
Why do we analyse social data? What are the possible uses of analysing social data?
25
Can visualisation help to understand graphs? (connections can be viewed, subset of graph can be analysed, …)
26
Have you ever used such a system? Which one?
27
What are the advantages and disadvantages of distributed graph processing? What is the value of graph processing?
28
How can social metric calculations deal with fake accounts?
29
30