S2Graph : A large-scale graph database with Hbase Reference 1. - PowerPoint PPT Presentation

daumkakao S2Graph : A large-scale graph database with Hbase

Reference 1. HBase Conference 2015 1.http://www.slideshare.net/HBaseCon/use-cases-session-5 2.https://vimeo.com/128203919 2. Deview 2015 3. Apache Con BigData Europe 1.http://sched.co/3ztM 4. Github: https://github.com/daumkakao/s2graph 2

Our Social Graph Listen count : Advertise Coupon Message price : Emoticon affinity affinity affinity affinity: Like count : 7 affinity Friend Style share : 3 affinity Eat Write rating : length : affinity Play affinity level: 6 affinity View Comment count : Read affinity Present price : 3 Search Group keyword size : 6 : 3

Our Social Graph Music ID : 603 Ad ID : 603 Listen count : 6 Advertise ctr : 0.32 Message ID : 201 Message length : 9 affinity 4 affinity 3 affinity 6 affinity: 9 affinity 9 Item ID : 13 Style Friend share : 3 affinity 1 Write length : 3 affinity 3 Play affinity 3 Post ID : 97 level: 6 affinity 2 Search Comment keyword length : 15 : “HBase" affinity 2 4 Game ID : 1984

Technical Challenges 1. Large social graph constantly changing a. Scale more than, social network: 10 billion edges, 200 million vertices, 50 million update on existing edges. user activities: over 1 billion new edges per day 5

Technical Challenges (cont) 2. Low latency for breadth first search traversal on connected data. a. performance requirement peak graph-traversing query per second: 20000 response time: 100ms 6

Technical Challenges (cont) 3. Realtime update capabilities for viral effects Fast Fast Fast Person A Person B Person C Person D Post Comment Sharing Mention 7

Technical Challenges (cont) 4. Support for Dynamic Ranking logic a. Push strategy: Hard to change data ranking logic dynamically. b. Pull strategy: Enables user to try out various data ranking logics. 8

Before Messaging SNS Blog App App App Friend relationship SNS feeds Blog user activities Messaging Each app server should know each DB’s sharding logic. Highly inter-connected architecture 9

After Messaging SNS Blog App App App S2Graph DB stateless app servers 10

daumkakao What is S2Graph?

What is S2Graph? Storage-as-a-Service + Graph API = Realtime Breadth First Search 12

Chat Room Message 1 Message 1 Message 1 Example: Messanger Data Model Contains Participates Recent messages in my chat rooms. SELECT a.* FROM user_chat_rooms a, chat_room_messages b WHERE a.user_id = 1 AND a.chat_room_id = b.chat_room_id WHERE b.created_at >= yesterday 13

Chat Room Message 1 Message 1 Message 1 Example: Messanger Data Model Contains Participates Recent messages in my chat rooms. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": "user_chat_rooms", "direction": "out", "limit": 100}], // step [{"label": "chat_room_messages", "direction": "out", "limit": 10, “where”: “created_at >= yesterday”}] ] } 14 '

Post1 Post 2 Post 3 Example: News Feed (cont) create/like/share posts Friends Posts that my friends interacted. SELECT a.*, b.* FROM friends a, user_posts b WHERE a.user_id = b.user_id WHERE b.updated_at >= yesterday and b.action_type in (‘create’, ‘like’, ‘share’) 15

Post1 Post 2 Post 3 Example: News Feed (cont) create/like/share posts Friends Posts that my friends interacted. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": "friends", "direction": "out", "limit": 100}], // step [{"label": “user_posts", "direction": "out", "limit": 10, “where”: “created_at >= yesterday”}] ] } 16 '

Product 1 Product2 Product 3 Example: Recommendation(User-based CF) (cont) Batch user-product interaction (click/buy/like/share) Similar Users Products that similar user interact recently. SELECT a.* , b.* FROM similar_users a, user_products b WHERE a.sim_user_id = b.user_id AND b.updated_at >= yesterday 17

Product 1 Product2 Product 3 Example: Recommendation(User-based CF) (cont) Batch user-product interaction (click/buy/like/share) Similar Users Products that similar user interact recently. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { “filterOut”: {“srcVertices”: [{“serviceName”: “s2graph”, “columnName”: “user_id”, “id”: 1}], “steps”: [[{“label”: “user_products_interact”}]] }, "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": “similar_users", "direction": "out", "limit": 100, “where”: “similarity > 0.2”}], // step [{"label": “user_products_interact”, "direction": "out", "limit": 10, “where”: “created_at >= yesterday and price >= 1000”}] ] } 18 '

Product 1 Product2 Product 3 Product 1 Product 1 Product 1 Example: Recommendation(Item-based CF) (cont) Batch user-product interaction Similar Products (click/buy/like/share) Products that are similar to what I have interested. SELECT a.* , b.* FROM similar_ a, user_products b WHERE a.sim_user_id = b.user_id AND b.updated_at >= yesterday 19

Product 1 Product 1 Product 1 Product 3 Product2 Product 1 Example: Recommendation(Item-based CF) (cont) Batch user-product interaction Similar Products (click/buy/like/share) Products that are similar to what I have interested. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": “user_products_interact", "direction": "out", "limit": 100, “where”: “created_at >= yesterday and price >= 1000”}], [{"label": “similar_products”, "direction": "out", "limit": 10, “where”: “similarity > 0.2”}] ] } 20 '

Product10 Product20 Product1 Product2 Product 3 Product20 Category1 Category2 Product10 Example: Recommendation(Content + Most popular) (cont) user-product interaction TopK(k=1) product per timeUnit(day) (click/buy/like/share) Today Yesterday Today Yesterday Daily top product per categories in products that I liked. SELECT c.* FROM user_products a, product_categories b, category_daily_top_products c WHERE a.user_id = 1 and a.product_id = b.product_id and b.category_id = c.category_id and c.time between (yesterday, today) 21

Product1 Product 3 Product10 Product20 Product20 Product10 Category2 Category1 Product2 Example: Recommendation(Content + Most popular) (cont) user-product interaction TopK(k=1) product per timeUnit(day) (click/buy/like/share) Today Yesterday Today Yesterday Daily top product per categories in products that I liked. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": “user_products_interact", "direction": "out", "limit": 100, “where”: “created_at >= yesterday and price >= 1000”}], [{“label”: “product_cates”, “direction”: “out”, “limit”: 3}], [{"label": “category_products_topK”, "direction": "out", "limit": 10] ] } 22 '

Product 1 Product2 Product 3 Example: Recommendation(Spreading Activation) (cont) user-product interaction (click/buy/like/share) Products that is interacted by users who interacted on products that I interact SELECT b.product_id, count(*) FROM user_products a, user_products b WHERE a.user_id = 1 AND a.product_id = b.product_id GROUP BY b.product_id 23

Product 1 Product2 Product 3 Example: Recommendation(Spreading Activation) (cont) user-product interaction (click/buy/like/share) Products that is interacted by users who interacted on products that I interact curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": “user_products_interact", "direction": "out", "limit": 100, “where”: “created_at >= yesterday and price >= 1000”}], [{"label": “user_products_interact", "direction": "in", "limit": 10, “where”: “created_at >= today”}], [{"label": “user_products_interact", "direction": "out", "limit": 10, “where”: “created_at >= 1 hour ago”}], ] } 24 '

S2Graph : A large-scale graph database with Hbase Reference 1. - PowerPoint PPT Presentation

daumkakao S2Graph : A large-scale graph database with Hbase Reference 1. HBase Conference 2015 1.http://www.slideshare.net/HBaseCon/use-cases-session-5 2.https://vimeo.com/128203919 2. Deview 2015 3. Apache Con BigData Europe

APACHE S2GRAPH (INCUBATING) AS A USER EVENT HUB KAKAO CORP. ABSTRACT Apache S2Graph

Graph Processing with Apache Tinkerpop on Apache S2Graph(incubating) TABLE OF CONTENTS -

S2Graph : A large-scale graph database with Hbase Doyoung Yoon x Taejin Chin DaumKakao A

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms Wing

Schema Matching in a Large Scale Schema Matching in a Large Scale Personal Schema Based Querying

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

Redis Graph A graph database built on top of redis Whats Redis? Open source in-memory

MongoDB @ SourceForge Mark Ramm We had a problem six weeks the other sourceforge over 90% of

Joan Brucha Healthy Eating Active Living Manager, CDPHE Colorado FTS Task Force Board Member 1

AGNs with the Fermi with the Fermi- -LAT: LAT: AGNs What we have seen What we have seen

Assessing and Improving the Quality of DNSSEC Deployment Deployment Casey Deccio, Ph.D. Sandia

Extending XQuery with Window Functions Irina Botan, Peter M. Fischer, Dana Florescu*, Donald

XML in the real world XML in the real world XML agentzh ( )

PING-ing AS A STRATEGY jane hamill 1 What is PING-ing? 2 Lets assuming youve done your

iPhone Apps: From Concept to Launch Dev Day for iPhone London Raven Zachary 25 June 2010