s2graph a large scale graph database
play

S2Graph : A large-scale graph database with Hbase Doyoung Yoon x - PowerPoint PPT Presentation

daumkakao S2Graph : A large-scale graph database with Hbase Doyoung Yoon x Taejin Chin DaumKakao A Mobile Lifestyle Platform 1. KakaoTalk a. Mobile Messenger replacing SMS b. KaTalkHe is being used as a verb in Korea like


  1. daumkakao S2Graph : A large-scale graph database with Hbase Doyoung Yoon x Taejin Chin

  2. DaumKakao A Mobile Lifestyle Platform 1. KakaoTalk a. Mobile Messenger replacing SMS b. ‘KaTalkHe’ is being used as a verb in Korea like ‘Googling’ c. 96% of Korean smartphone users are using KakaoTalk d. 170M users worldwide e. 3B messages / day 2

  3. DaumKakao A Mobile Lifestyle Platform Social Contents Commerce Marketing Local Personal Platform Platform Platform Platform Platform Platform KakaoTalk Media Daum KakaoPick Yellow ID Daum Map KakaoHome KakaoStory KakaoGame Gift Shop Plus Friend KakaoPlace Sol calendar Digital Item Store KakaoGroup KakaoStyle Story Plus Sol Mail Daum Cafe KakaoTopic Daum Cluod 96% of Zap KakaoPage Korean smartphone Biggest mobile users are using KakaoTalk SNS in Korea messenger, 170 million Sol Group KakaoMusic users worldwide) Daum tvPot Daum Webtoon 3

  4. Our Social Graph Listen count : Advertise Coupon Message price : Emoticon affinity affinity affinity affinity: Like count : 7 affinity Friend Pick withFriend : affinity Eat Write rating : length : affinity Play affinity level: 6 affinity View Comment count : Read affinity Present price : 3 Search Group keyword size : 6 : 4

  5. Our Social Graph Music ID : 603 Ad ID : 603 Listen count : 6 Advertise ctr : 0.32 Message ID : 201 Message length : 9 affinity 4 affinity 3 affinity 6 affinity: 9 affinity 9 Item ID : 13 Pick Friend withFriend : 3 affinity 1 Write length : 3 affinity 3 Play affinity 3 Post ID : 97 level: 6 affinity 2 Search Comment keyword length : 15 : “HBase" affinity 2 5 Game ID : 1984

  6. Technical Challenges 1. Large social graph constantly changing a. Scale more than, social network: 10 billion edges, 200 million vertices, 50 million update on existing edges. user activities: 400 million new edges per day 6

  7. Technical Challenges (cont) 2. Low latency for breadth first search traversal on connected data. a. performance requirement peak graph-traversing query per second: 20000 response time: 100ms 7

  8. Technical Challenges (cont) 3. Update should be applied to graph in real time for viral effect Fast Fast Fast Person A Person B Person C Person D Post Comment Sharing Mention 8

  9. Technical Challenges (cont) 4. Support for Dynamic Ranking logic a. push strategy: hard to change data ranking logic dynamically. b. pull strategy: can try various data ranking logic 9

  10. Before Messaging SNS Blog App App App Friend relationship SNS feeds Blog user activities Messaging Each app server should know each DB’s sharding logic. Highly inter-connected architecture 10

  11. After Messaging SNS Blog App App App S2Graph DB stateless app servers 11

  12. S2Graph : Distributed Online GraphDB 1.Low-latency 2.Graph-traversable 3.Scalable 4.Eventually consistent 5.Asynchronous, non-blocking 12

  13. Why We Choose HBase? 1.High Availability 2.Scalability 3.Low latency 4.High concurrency 5.Fault tolerant 6.Integration with HDFS 7.Distributed operation 13

  14. The Data Model vertex 1 out edges vertex 2 in edges edge 2 label edge 2 source vertex edge 2 target vertex 1. Columns 2. Labels 3. Directions 4. Index Properties 1 3 2 comment 5. Non-index Properties 5 created know date = 20150507 4 name = “josh” 
 edge 5 properties age = 32 vertex 4 id vertex 4 properties 14

  15. How to store the data - Edge Logical View 1. Snapshot edges : Up-to-date status of edge column Tgt Vertex ID1 Tgt Vertex ID2 Tgt Vertex ID3 row Src Vertex ID1 Properties Properties Properties Src Vertex ID2 Properties Properties Properties a. Fetching an edge between two specific vertex b. Lookup Table to reach indexed edges for update, increment, delete operations 15

  16. How to store the data - Edge Logical View 2. Indexed edges : Edges with index column Index Values | Tgt Vertex ID1 Index Values | Tgt Vertex ID2 row Src Vertex ID1 Non-index Properties Non-index Properties a. Fetches edges originating from a certain vertex in order of index 16

  17. How to store the data - Edge Physical View - table schema 1. Snapshot Edge a. Rowkey Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string) 17

  18. How to store the data - Edge Physical View - table schema 1. Snapshot Edge c. Value b. Qualifier Target Vertex ID All Property Key Value Pairs variable length variable length 18

  19. How to store the data - Edge Physical View - table schema 2. Indexed Edge a. Rowkey Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string) 19

  20. How to store the data - Edge Physical View - table schema 2. Indexed Edge c. Value b. Qualifier Index Property Values Tgt Vertex ID Non-index Property Key Value Pairs variable length variable length variable length 20

  21. How to store the data - Vertex Logical View 1. Vertex : Up-to-date status of Vertex column Property Key1 Property Key2 row Src Vertex ID1 Value1 Value2 Vertex ID2 Value1 Value2 21

  22. How to store the data - Vertex Physical View - table schema 1. Vertex : Up-to-date status of Vertex a. Rowkey Murmur Hash Column ID Vertex ID 16 bit integer(32bit) variable length b. Qualifier c. Value Property Key Property Value Byte(8 bit) variable length 22

  23. How to read the data - GetEdges Using a custom query DSL on top of HTTP User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { Step 1 Friends Friends "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ friend 1 friend 2 [{"label": "friends", "direction": "out", "limit": 100}], // step [{"label": "hear", "direction": "out", "limit": 10}] ] hear hear hear Step 2 } time: 20140502 time: 20140712 time: 20141116 ' Steps = a list of Step Step = contains the labels to traverse Don’t let go let it be let it go and how to rank them in the result 23

  24. How to read the data - GetEdges Example Friend list User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ Friends Friends [{"label": "friends", "direction": "out", "limit": 100}], // step ] } ' friend 1 friend 2 24

  25. How to read the data - GetEdges Example Songs my friends have listened User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { Friends Friends "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ friend 1 friend 2 [{"label": "friends", "direction": "out", "limit": 50, “scoring”: {“score”: 1.0}], [{"label": "listen", "direction": "out", "limit": 10}] ] hear hear hear } time: 20140502 time: 20140712 time: 20141116 ' Don’t let go let it be let it go Reference : https://github.com/daumkakao/s2graph#1-definition 25

  26. How to read the data - GetEdges Example Similar songs to songs that I have listened to. User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { hear hear hear "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], time: 20140502 time: 20140712 time: 20141116 "steps": [ [{"label": "listen", "direction": "out", "limit": 50}], [{"label": "similar_song", "direction": "out", "limit": 10, “scoring”: {“score”: 1.0}] ] Don’t let go let it be let it go } similar_song similar_song similar_song similarity: 0.3 similarity: 0.4 similarity: 0.6 let it bleed Hey jude Do you wanna build a snowman? 26

  27. How to read the data - GetVertices curl -XPOST localhost:9000/graphs/getVertices -H 'Content-Type: Application/json' -d ' [ {"serviceName": "s2graph", "columnName": "account_id", "ids": [1, 2, 3]}, {"serviceName": "kakaomusic", "columnName": "user_id", "ids": [1, 2, 3]} ] ' User 1 User 2 {created_at:20070812, {created_at:201206132, updated_at:20150507} updated_at:20140505} 27

  28. How to write the data - Insert curl -XPOST localhost:9000/graphs/edges/insert -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","props":{"time":-1, "weight":10},"timestamp":1417616431}, ] ' User 1 User 2 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend