S2Graph : A large-scale graph database with Hbase Doyoung Yoon x - PowerPoint PPT Presentation

daumkakao S2Graph : A large-scale graph database with Hbase Doyoung Yoon x Taejin Chin

DaumKakao A Mobile Lifestyle Platform 1. KakaoTalk a. Mobile Messenger replacing SMS b. ‘KaTalkHe’ is being used as a verb in Korea like ‘Googling’ c. 96% of Korean smartphone users are using KakaoTalk d. 170M users worldwide e. 3B messages / day 2

DaumKakao A Mobile Lifestyle Platform Social Contents Commerce Marketing Local Personal Platform Platform Platform Platform Platform Platform KakaoTalk Media Daum KakaoPick Yellow ID Daum Map KakaoHome KakaoStory KakaoGame Gift Shop Plus Friend KakaoPlace Sol calendar Digital Item Store KakaoGroup KakaoStyle Story Plus Sol Mail Daum Cafe KakaoTopic Daum Cluod 96% of Zap KakaoPage Korean smartphone Biggest mobile users are using KakaoTalk SNS in Korea messenger, 170 million Sol Group KakaoMusic users worldwide) Daum tvPot Daum Webtoon 3

Our Social Graph Listen count : Advertise Coupon Message price : Emoticon affinity affinity affinity affinity: Like count : 7 affinity Friend Pick withFriend : affinity Eat Write rating : length : affinity Play affinity level: 6 affinity View Comment count : Read affinity Present price : 3 Search Group keyword size : 6 : 4

Our Social Graph Music ID : 603 Ad ID : 603 Listen count : 6 Advertise ctr : 0.32 Message ID : 201 Message length : 9 affinity 4 affinity 3 affinity 6 affinity: 9 affinity 9 Item ID : 13 Pick Friend withFriend : 3 affinity 1 Write length : 3 affinity 3 Play affinity 3 Post ID : 97 level: 6 affinity 2 Search Comment keyword length : 15 : “HBase" affinity 2 5 Game ID : 1984

Technical Challenges 1. Large social graph constantly changing a. Scale more than, social network: 10 billion edges, 200 million vertices, 50 million update on existing edges. user activities: 400 million new edges per day 6

Technical Challenges (cont) 2. Low latency for breadth first search traversal on connected data. a. performance requirement peak graph-traversing query per second: 20000 response time: 100ms 7

Technical Challenges (cont) 3. Update should be applied to graph in real time for viral effect Fast Fast Fast Person A Person B Person C Person D Post Comment Sharing Mention 8

Technical Challenges (cont) 4. Support for Dynamic Ranking logic a. push strategy: hard to change data ranking logic dynamically. b. pull strategy: can try various data ranking logic 9

Before Messaging SNS Blog App App App Friend relationship SNS feeds Blog user activities Messaging Each app server should know each DB’s sharding logic. Highly inter-connected architecture 10

After Messaging SNS Blog App App App S2Graph DB stateless app servers 11

S2Graph : Distributed Online GraphDB 1.Low-latency 2.Graph-traversable 3.Scalable 4.Eventually consistent 5.Asynchronous, non-blocking 12

Why We Choose HBase? 1.High Availability 2.Scalability 3.Low latency 4.High concurrency 5.Fault tolerant 6.Integration with HDFS 7.Distributed operation 13

The Data Model vertex 1 out edges vertex 2 in edges edge 2 label edge 2 source vertex edge 2 target vertex 1. Columns 2. Labels 3. Directions 4. Index Properties 1 3 2 comment 5. Non-index Properties 5 created know date = 20150507 4 name = “josh”   edge 5 properties age = 32 vertex 4 id vertex 4 properties 14

How to store the data - Edge Logical View 1. Snapshot edges : Up-to-date status of edge column Tgt Vertex ID1 Tgt Vertex ID2 Tgt Vertex ID3 row Src Vertex ID1 Properties Properties Properties Src Vertex ID2 Properties Properties Properties a. Fetching an edge between two specific vertex b. Lookup Table to reach indexed edges for update, increment, delete operations 15

How to store the data - Edge Logical View 2. Indexed edges : Edges with index column Index Values | Tgt Vertex ID1 Index Values | Tgt Vertex ID2 row Src Vertex ID1 Non-index Properties Non-index Properties a. Fetches edges originating from a certain vertex in order of index 16

How to store the data - Edge Physical View - table schema 1. Snapshot Edge a. Rowkey Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string) 17

How to store the data - Edge Physical View - table schema 1. Snapshot Edge c. Value b. Qualifier Target Vertex ID All Property Key Value Pairs variable length variable length 18

How to store the data - Edge Physical View - table schema 2. Indexed Edge a. Rowkey Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string) 19

How to store the data - Edge Physical View - table schema 2. Indexed Edge c. Value b. Qualifier Index Property Values Tgt Vertex ID Non-index Property Key Value Pairs variable length variable length variable length 20

How to store the data - Vertex Logical View 1. Vertex : Up-to-date status of Vertex column Property Key1 Property Key2 row Src Vertex ID1 Value1 Value2 Vertex ID2 Value1 Value2 21

How to store the data - Vertex Physical View - table schema 1. Vertex : Up-to-date status of Vertex a. Rowkey Murmur Hash Column ID Vertex ID 16 bit integer(32bit) variable length b. Qualifier c. Value Property Key Property Value Byte(8 bit) variable length 22

How to read the data - GetEdges Using a custom query DSL on top of HTTP User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { Step 1 Friends Friends "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ friend 1 friend 2 [{"label": "friends", "direction": "out", "limit": 100}], // step [{"label": "hear", "direction": "out", "limit": 10}] ] hear hear hear Step 2 } time: 20140502 time: 20140712 time: 20141116 ' Steps = a list of Step Step = contains the labels to traverse Don’t let go let it be let it go and how to rank them in the result 23

How to read the data - GetEdges Example Friend list User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ Friends Friends [{"label": "friends", "direction": "out", "limit": 100}], // step ] } ' friend 1 friend 2 24

How to read the data - GetEdges Example Songs my friends have listened User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { Friends Friends "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ friend 1 friend 2 [{"label": "friends", "direction": "out", "limit": 50, “scoring”: {“score”: 1.0}], [{"label": "listen", "direction": "out", "limit": 10}] ] hear hear hear } time: 20140502 time: 20140712 time: 20141116 ' Don’t let go let it be let it go Reference : https://github.com/daumkakao/s2graph#1-definition 25

How to read the data - GetEdges Example Similar songs to songs that I have listened to. User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { hear hear hear "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], time: 20140502 time: 20140712 time: 20141116 "steps": [ [{"label": "listen", "direction": "out", "limit": 50}], [{"label": "similar_song", "direction": "out", "limit": 10, “scoring”: {“score”: 1.0}] ] Don’t let go let it be let it go } similar_song similar_song similar_song similarity: 0.3 similarity: 0.4 similarity: 0.6 let it bleed Hey jude Do you wanna build a snowman? 26

How to read the data - GetVertices curl -XPOST localhost:9000/graphs/getVertices -H 'Content-Type: Application/json' -d ' [ {"serviceName": "s2graph", "columnName": "account_id", "ids": [1, 2, 3]}, {"serviceName": "kakaomusic", "columnName": "user_id", "ids": [1, 2, 3]} ] ' User 1 User 2 {created_at:20070812, {created_at:201206132, updated_at:20150507} updated_at:20140505} 27

How to write the data - Insert curl -XPOST localhost:9000/graphs/edges/insert -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","props":{"time":-1, "weight":10},"timestamp":1417616431}, ] ' User 1 User 2 28

S2Graph : A large-scale graph database with Hbase Doyoung Yoon x - PowerPoint PPT Presentation

daumkakao S2Graph : A large-scale graph database with Hbase Doyoung Yoon x Taejin Chin DaumKakao A Mobile Lifestyle Platform 1. KakaoTalk a. Mobile Messenger replacing SMS b. KaTalkHe is being used as a verb in Korea like

APACHE S2GRAPH (INCUBATING) AS A USER EVENT HUB KAKAO CORP. ABSTRACT Apache S2Graph

Graph Processing with Apache Tinkerpop on Apache S2Graph(incubating) TABLE OF CONTENTS -

S2Graph : A large-scale graph database with Hbase Reference 1. HBase Conference 2015

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms Wing

Schema Matching in a Large Scale Schema Matching in a Large Scale Personal Schema Based Querying

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

Redis Graph A graph database built on top of redis Whats Redis? Open source in-memory

Motivating Problem: Complete Contracts Abstractions via Mathematical Models Recall what we

CS 225 Data Structures Se Sept. 20 20 Ar Array Li Lists - St Stac acks and and Que

Alternating-time temporal logic Mehdi Dastani BBL-521 M.M.Dastani@uu.nl ATL: Alternating-time

Compiling and Linking C code Assembly C Source C Source C Source Source .c Code Code Code

Design Patterns Observer Oliver Haase 1 Description Object based behavioral pattern

Search algorithms Algorithms are a constrained form of rewriting systems. You may remember that,

CAESAR: Context-Aware Event Stream Analytics in Real time Olga Poppe, Chuan Lei, Elke A.

New Communication Technologies For Effective Membership Recruiting Alice Aguilar, Executive