When a single graph isnt enough FRANK SMIT Chief Innovation Officer - - PowerPoint PPT Presentation

when a single graph isn t enough frank smit
SMART_READER_LITE
LIVE PREVIEW

When a single graph isnt enough FRANK SMIT Chief Innovation Officer - - PowerPoint PPT Presentation

When a single graph isnt enough FRANK SMIT Chief Innovation Officer Co-founder and CEO The number one tool for social media monitoring, webcare, publishing & social analytics Founded in 2011 Located in Zaandam, Netherlands 25


slide-1
SLIDE 1

When a single graph isn’t enough

slide-2
SLIDE 2

Chief Innovation Officer

FRANK SMIT

Co-founder and CEO

slide-3
SLIDE 3

“The number one tool for social media monitoring, webcare, publishing & social analytics” Founded in 2011 Located in Zaandam, Netherlands 25 employees Over 700 customers in 8 countries

slide-4
SLIDE 4

Collect millions of messages

  • n a daily basis

Twitter, Facebook, Instagram, Pinterest, LinkedIn, Youtube, Google+, news sites, blogs and fora

Data

https://dribbble.com/shots/1233464-24-Free-Flat-Social-Icons

slide-5
SLIDE 5

“We develop AI and data applications for organisations” Founded in 2015 Located in Zaandam, Netherlands 5 employees 12 customers

slide-6
SLIDE 6

Different companies with different use cases and therefore different graphs and challenges But we want ONE solution!

slide-7
SLIDE 7

How shareable is my message? Given a campaign, who are the influencers? Which of our followers ask questions to our competitors? Community detection

Social graph

http://www.scribblelive.com/blog/2013/10/30/movie-galaxies-uses- social-graph-organization-to-visualize-movie-interconnectedness/

slide-8
SLIDE 8

People have multiple social media accounts Querying persons instead of accounts could be very valuable

Social account graph

slide-9
SLIDE 9

Customers look at products, review products, buy products, etc By combing the customer graph with social graph, better segmentation is possible

Customer graph

https://cdn.graphgrid.com/content/uploads/2016/04/04125950/ConnectedCustomer.png

slide-10
SLIDE 10

Graph can be stored in different storage systems Graph connectors (like data connectors in spark)

Storages

slide-11
SLIDE 11

Software as a Service (SaaS) Keep company private data safe Make sure that customer X cannot query data from customer Y

Security

https://privacy.google.com/images/animations/your-security/last-frame-1.svg

slide-12
SLIDE 12

High volume: billion connections collected already since the start High velocity: about 100 messages a second

Two V’s

https://media.licdn.com/mpr/mpr/shrinknp_800_800/AAEAAQAAAAAAAAVQAAAAJDUwOGNmZjgxLTBjODQtNGUyMi05ZWUyLTVhY2RhMTU3OGFlYQ.jpg https://www.extrasrl.it/hs-fs/hubfs/New_Website/New_Color_Background/31_percent.png?t=1489767435682&width=320&name=31_percent.png

slide-13
SLIDE 13

Requirements

1. SaaS to allow for online graph analytics 2. Scalable architecture so that multipe customers could query the data at the same time 3. Different kind of graphs in the graph space 4. Keep the private data secure and separated from the rest

slide-14
SLIDE 14

MULTI NODE vs SINGLE NODE

slide-15
SLIDE 15

Titan had trouble loading the data into its graph format MonetDB had trouble performing the actual graph- like queries Virtuoso proved to be stable even under high data load Spark was not always the fastest but scaled very well

Benchmark results

slide-16
SLIDE 16

Our first prototype consists of an API on top of Spark Queries are processed by the API and scala code is generated to be performed on Spark Graphs can be stored in ElasticSearch, Cassandra and

  • n disk

General architecture using Spark

slide-17
SLIDE 17

Namespaces to keep the data model as general as possible to cope with the different graphs Data definitions

Data model

{ "_namespace": "com.obi4wan.social", "_types": [ { "_type": "message", "_fields": { "content": { "_type": "generic.message" }, "date": { "_type": "generic.datetime" }, "hashtags" : { "_type": "com.obi4wan.social.hashtag", "_structure": "list" }, "author": { "_type": "com.obi4wan.social.account" } } } ] }

slide-18
SLIDE 18

JSON base query language for defining query steps search: search using elasticsearch enrich: join previous step on subgraph

Query plan

{ "queryplan": [ { "graph": { "v": "com.obi4wan.social.message" } }, { "search": { "field": "com.obi4wan.social.message.content", "query": "fire OR smoke" } }, { "enrich": { "type": "com.obi4wan.social.account", "on": { "old": "com.obi4wan.social.message.author", "nw": "com.obi4wan.social.account.url" } } }, { "enrich": { "type": "com.obilytics.people.account", "on": { "old": "com.obi4wan.social.account.url", "nw": "com.obilytics.people.account.url" } } } ] }

slide-19
SLIDE 19

Next steps and remaining challenges

1. Dataframes are immutable, how to update data in realtime (indexedRDDs) 2. Search is now done through ElasticSearch, would be nice to do that using a Spark only solution 3. Query language is limited, use Cypher

slide-20
SLIDE 20

Questions?