Apache Giraph
Large-scale Graph Processing on Hadoop Claudio Martella <claudio@apache.org> @claudiomartella
Apache Giraph Large-scale Graph Processing on Hadoop Claudio - - PowerPoint PPT Presentation
Apache Giraph Large-scale Graph Processing on Hadoop Claudio Martella <claudio@apache.org> @claudiomartella 2 Graphs are simple 3 A computer network 4 A social network 5 A semantic network 6 A map 7 Predicting break ups Graph
Large-scale Graph Processing on Hadoop Claudio Martella <claudio@apache.org> @claudiomartella
2
3
4
5
6
7
8
Aggregation approach Graph approach
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def compute(vertex, messages): minValue = Inf # float(‘Inf’) for m in messages: minValue = min(minValue, m) if minValue < vertex.getValue(): vertex.setValue(minValue) for edge in vertex.getEdges(): message = minValue + edge.getValue() sendMessage(edge.getTargetId(), message) vertex.voteToHalt()
25
26
27
28
29
30
communication
parallelizable
31
sent
32
33
34
35
36
37
38
ref: https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion- edges/10151617006153920
;-)
39
40
fastutils)
41
42
recommenders: ALS, SGD, SVD++, etc.
partitioning, Community Detection, K-Core, etc.
43
<claudio@apache.org> @claudiomartella http://giraph.apache.org