S2Graph : A large-scale graph database
with Hbase
daumkakao
Doyoung Yoon x Taejin Chin
S2Graph : A large-scale graph database with Hbase Doyoung Yoon x - - PowerPoint PPT Presentation
daumkakao S2Graph : A large-scale graph database with Hbase Doyoung Yoon x Taejin Chin DaumKakao A Mobile Lifestyle Platform 1. KakaoTalk a. Mobile Messenger replacing SMS b. KaTalkHe is being used as a verb in Korea like
daumkakao
Doyoung Yoon x Taejin Chin
2
A Mobile Lifestyle Platform
3
KakaoTalk Social Platform
A Mobile Lifestyle Platform
KakaoStory KakaoGroup Daum Cafe Contents Platform KakaoTopic KakaoPage KakaoGame Commerce Platform KakaoPick KakaoMusic Marketing Platform Yellow ID Media Daum Daum tvPot Local Platform Daum Map Daum Webtoon Personal Platform Sol calendar KakaoPlace Sol Mail Zap Plus Friend Gift Shop Digital Item Store KakaoStyle KakaoHome Sol Group Story Plus Daum Cluod
Biggest mobile SNS in Korea 96% of Korean smartphone users are using KakaoTalk messenger, 170 million users worldwide)
4
Message Write length : Read Coupon price : Present price : 3
affinity affinity: affinity affinity affinity affinity affinity affinity affinity
Friend
Group size : 6 Emoticon Eat rating : View count : Play level: 6 Pick withFriend : Advertise Search keyword : Listen count : Like count : 7 Comment
affinity
5
Message length : 9 Write length : 3
affinity 6 affinity: 9 affinity 3 affinity 3 affinity 4 affinity 1 affinity 2 affinity 2 affinity 9
Friend
Play level: 6 Pick withFriend : 3 Advertise ctr : 0.32 Search keyword : “HBase" Listen count : 6
Comment length : 15
affinity 3
Message ID : 201 Ad ID : 603 Music ID : 603 Item ID : 13 Post ID : 97 Game ID : 1984
6
more than, social network: 10 billion edges, 200 million vertices, 50 million update on existing edges. user activities: 400 million new edges per day
7
peak graph-traversing query per second: 20000 response time: 100ms
8
Person A
Post Fast
Person B
Comment
Person C
Sharing
Person D
Mention Fast Fast
9
10
Each app server should know each DB’s sharding logic. Highly inter-connected architecture
Friend relationship SNS feeds Blog user activities Messaging
Messaging App SNS App Blog App
11
SNS App Blog App Messaging App
stateless app servers
12
1.Low-latency 2.Graph-traversable 3.Scalable 4.Eventually consistent 5.Asynchronous, non-blocking
13
1.High Availability 2.Scalability 3.Low latency 4.High concurrency 5.Fault tolerant 6.Integration with HDFS 7.Distributed operation
14
1 3
comment
4
know created
name = “josh” age = 32
edge 2 source vertex vertex 1 out edges edge 2 target vertex 2 edge 2 label vertex 2 in edges vertex 4 id vertex 4 properties
date = 20150507
edge 5 properties 5
15
Logical View
column row Tgt Vertex ID1 Tgt Vertex ID2 Tgt Vertex ID3 Src Vertex ID1 Properties Properties Properties Src Vertex ID2 Properties Properties Properties
16
Logical View
column row Index Values | Tgt Vertex ID1 Index Values | Tgt Vertex ID2 Src Vertex ID1 Non-index Properties Non-index Properties
17
Physical View - table schema
Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string)
18
Physical View - table schema
Target Vertex ID variable length
All Property Key Value Pairs variable length
19
Physical View - table schema
Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string)
20
Physical View - table schema
Index Property Values Tgt Vertex ID variable length variable length
Non-index Property Key Value Pairs variable length
21
Logical View
column row Property Key1 Property Key2 Src Vertex ID1 Value1 Value2 Vertex ID2 Value1 Value2
22
Physical View - table schema
Murmur Hash Column ID Vertex ID 16 bit integer(32bit) variable length
Property Key Byte(8 bit)
Property Value variable length
23
Using a custom query DSL on top of HTTP
curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ [{"label": "friends", "direction": "out", "limit": 100}], // step [{"label": "hear", "direction": "out", "limit": 10}] ] } '
Steps = a list of Step Step = contains the labels to traverse and how to rank them in the result Step 1
friend 1
hear time: 20140502 hear time: 20140712 hear time: 20141116
Friends Friends
friend 2 User 1
Step 2
Don’t let go let it be let it go
24
Friend list
curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ [{"label": "friends", "direction": "out", "limit": 100}], // step ] } ' friend 1 friend 2 User 1
Friends Friends
25
Songs my friends have listened
curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ [{"label": "friends", "direction": "out", "limit": 50, “scoring”: {“score”: 1.0}], [{"label": "listen", "direction": "out", "limit": 10}] ] } ' friend 1
Friends Friends
friend 2 Don’t let go let it be let it go
hear time: 20140502 hear time: 20140712 hear time: 20141116
User 1
Reference : https://github.com/daumkakao/s2graph#1-definition
26
Similar songs to songs that I have listened to.
curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ [{"label": "listen", "direction": "out", "limit": 50}], [{"label": "similar_song", "direction": "out", "limit": 10, “scoring”: {“score”: 1.0}] ] } User 1 Don’t let go let it be let it go
hear time: 20140502 hear time: 20140712 hear time: 20141116
let it bleed Hey jude Do you wanna build a snowman?
similar_song similarity: 0.3 similar_song similarity: 0.4 similar_song similarity: 0.6
27
curl -XPOST localhost:9000/graphs/getVertices -H 'Content-Type: Application/json' -d ' [ {"serviceName": "s2graph", "columnName": "account_id", "ids": [1, 2, 3]}, {"serviceName": "kakaomusic", "columnName": "user_id", "ids": [1, 2, 3]} ] '
User 1
{created_at:20070812, updated_at:20150507}
User 2
{created_at:201206132, updated_at:20140505}
28
curl -XPOST localhost:9000/graphs/edges/insert -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","props":{"time":-1, "weight":10},"timestamp":1417616431}, ] '
User 1 User 2
29
curl -XPOST localhost:9000/graphs/edges/delete -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","timestamp":1417616431}, {"from":1,"to":3,"label":"graph_test","timestamp":1417616431}, ] '
User 1 User 2
30
curl -XPOST localhost:9000/graphs/edges/update -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","timestamp":1417616431, "props": {"is_hidden": true, “status”: 200}, {"from":1,"to":3,"label":"graph_test","timestamp":1417616431, "props": {"status": -500} ]
User 1 User 2 friend
{is_hidden:true, status:200}
31
Read
Write
Management
32
33
34
35
Titan (v0.4.2)
36
Titan is less efficient for graph traversal
Vertex(“userID: 1”).out(“friends”).limit(10).out(“friends”).limit(10)
User 1 friends friends
37
Vertex(“userID:1”).out(“friend”).limit(10).out(“friend”).limit(10) Titan S2graph
# of read requests
112 = 1 (Vertex Lookup : a)
+ 1 (1st step edges : b) + 10 (2nd step edges : c) + 100 (Destination Vertices : d)
11 = 1 (1step edges : e)
+ 10 (2nd step edges : f)
Titan S2grap h
B A C D e f
38
39
Latency 50 100 150 200 QPS 1,000 2,000 3,000 4,000
# of app server
1 2 4 8
QPS(Query Per Second) Latency(ms)
# of app server
1 2 3 4 5 6 7 8 500 1000 1500 2000 2500 3000
QPS
40
Latency 75 150 225 300 QPS 500 1,000 1,500 2,000 Limit on first step 20 40 80 100
QPS Latency(ms)
Latency 75 150 225 300 QPS 500 1,000 1,500 2,000 Limit on first step 20 40 80 100
QPS Latency(ms)
42
Latency 37.5 75 112.5 150 QPS 80 160 240 320 400 limits on path 10 -> 100 100 -> 10 10 -> 10 -> 10 2 -> 5 -> 10 -> 10 2 -> 5 -> 2 -> 5 -> 10
QPS Latency(ms)
43
Latency 1.25 2.5 3.75 5 Request per second 8000 16000 800000
44
Latency 2 4 6 8 Request per second 2000 4000 6000
45
* Deep traversal queries are not counted since it is in test stage for production
46
47
48
49
Latency 50 100 150 200 QPS 500 1,000 1,500 2,000 # of app server 1 2 3 4 5
Native Client QPS Native Client Latency(ms)
Latency 50 100 150 200 QPS 500 1,000 1,500 2,000 # of app server 1 2 3 4 5
Asnychbase QPS Asynchbase Latency(ms)
3.5x performance improvement using Asynchbase