Querying Geo-social Data by Bridging Spatial Networks and Social - - PowerPoint PPT Presentation

querying geo social data by bridging spatial networks and
SMART_READER_LITE
LIVE PREVIEW

Querying Geo-social Data by Bridging Spatial Networks and Social - - PowerPoint PPT Presentation

Querying Geo-social Data by Bridging Spatial Networks and Social Networks Yerach Ben Yaron Doytsher Galon Kanza 1 Motivation Social networks provide valuable information on social relationships among people (users) Associating


slide-1
SLIDE 1

Querying Geo-social Data by Bridging Spatial Networks and Social Networks

Yerach Doytsher Ben Galon Yaron Kanza

1

slide-2
SLIDE 2

Motivation

  • Social networks provide valuable information
  • n social relationships among people (users)
  • Associating users to a spatial network can

provide geographical information on locations that users visit

  • Combining social networks and spatial

networks is required for answering queries whose constraints comprise spatial and social conditions

2

slide-3
SLIDE 3

Life Patterns

  • Life patterns connect people and places
  • A life pattern is essentially a triple

(user, geographic entity, time unit)

  • For example,

(Alice, Tower of London, Sundays) specifies that Alice visits the Tower of London, every Sunday

  • Life patterns can be extracted from GPS logs.

As shown in the work of Ye et al.

3

slide-4
SLIDE 4

Example

Alice jogs every morning, and she wants to find a partner for jogging

  • A potential partner will be someone who:
  • 1. Is a friend of Alice or a friend of a friend
  • 2. Frequently jogs in the same area where Alice

jogs and at similar times as she does

4

The life patterns will indicate presence in the same parks at similar times

slide-5
SLIDE 5

Proposed Model

5

  • A social network holds

information about people and their relationships

Who are Alice’s

friends?

slide-6
SLIDE 6

Proposed Model

6

  • A spatial network holds

information about spatial entities and their relationships

Where are the parks in

Alice’s neighborhood?

slide-7
SLIDE 7

Proposed Model

7

slide-8
SLIDE 8

Integrating the Networks

8

  • Life patterns are generated

from GPS log files and they connect people to places they frequently visit

When do these people

visit the parks?

slide-9
SLIDE 9

Integrating the Networks

9

  • A spatio-social network

(SSN) comprises both networks and the life patterns that connect them

Who has been Where and When

slide-10
SLIDE 10

Social Network

10

The social network is a graph where:

  • The nodes represents real-

world people, namely users, with their attributes

  • The edges represent

relationships, typically friendship relationships, between users

slide-11
SLIDE 11

Geographical Hierarchy

11

UK

slide-12
SLIDE 12

Geographical Hierarchy

12

UK England

Northern Ireland

Wales Scotland

slide-13
SLIDE 13

Geographical Hierarchy

13

UK England

Northern Ireland

Wales Scotland

London Bristol Liverpool Sheffield Manchester Leeds

slide-14
SLIDE 14

Geographical Hierarchy

14

UK England

Northern Ireland

Wales Scotland

London Bristol Liverpool Sheffield Manchester Leeds

slide-15
SLIDE 15

Geographical Hierarchy

15

Northern Ireland

UK England Wales Scotland

London Bristol Liverpool Sheffield Manchester Leeds

Adjacency edges represent a direct real-world connection between two geographical entities from the same hierarchy level

slide-16
SLIDE 16

Spatial Network

16

The spatial network is a graph where:

  • Each node represents a

geographic entity

  • Two types of edges:
  • 1. Hierarchical edges
  • 2. Adjacency edges
slide-17
SLIDE 17

Time Patterns

17

  • Time patterns represent repeated events:

“every week”, “every day”, "every workday”, etc.

  • There is a hierarchy of time patterns:

– If an event happens at some level in the hierarchy, it also occurs in the higher levels – If Alice visits 10 Downing St. every workday, then Alice visits 10 Downing St. every week, every month, etc.

slide-18
SLIDE 18

Life Patterns

18

  • Associate between users and

geographic entities

  • Hold time patterns
  • Have a confidence rank
slide-19
SLIDE 19

Life Patterns – Example

19

  • Alice visits 10

Downing St. every workday from 10 A.M to 12 P.M

Confidence value was

  • mitted, for simplicity
slide-20
SLIDE 20

The Query Language

20

  • We developed a query language that has the

form of an algebra with seven operators:

1. Select 2. Extend 3. Union 4. Intersect 5. Difference

  • 6. Bridge
  • 7. Multi-Bridge

Each operator returns a collection of nodes of a single network (either users or geographic entities)

slide-21
SLIDE 21

The Algebra

  • The proposed algebra was designed to be

– Expressive – Yet, efficient – e.g., no Cartesian product

21

slide-22
SLIDE 22

The Select Operator

22

  • Receives a set of nodes

from a network (social or spatial) and a condition

  • Returns the nodes that

satisfy the condition

select(nodes_set, condition)

slide-23
SLIDE 23

Select – Example

23

select(Nsocial, color = blue )

slide-24
SLIDE 24

The Extend Operator

24

  • Receives a set of nodes from a

network (social or spatial) and a parameter n

  • Returns the set of nodes that are

reachable by paths with maximum length of n from the given nodes

extend(nodes_set, n)

slide-25
SLIDE 25

Extend – Example

25

extend(select(Nsocial, color = green),2)

slide-26
SLIDE 26

Union, Intersect and Difference

26

  • Receive two sets of nodes

– all the nodes from the same network

  • Have the same semantics

as in set theory

union(nodes_set_A, nodes_set_B) intersect(nodes_set_A, nodes_set_B) difference(nodes_set_A, nodes_set_B)

slide-27
SLIDE 27

The Bridge Operator

27

  • Receives nodes of one network, a time pattern

and a confidence threshold

  • Returns the nodes of the other

network that are connected to the nodes of the given node set by those life patterns that satisfy the given time pattern and confidence threshold

bridge(nodes_set, time-pattern, confidence)

slide-28
SLIDE 28

Bridge – Example I

28

A = select(Nspatial, address like ‘% 10 Downing St%’) bridge(A, ‘every day’, 0.8)

slide-29
SLIDE 29

Bridge – Example II

29

A = select(Nsocial, color = yellow) B = extend(A, 2) bridge(B, ‘every morning’, 0.8)

slide-30
SLIDE 30

The Multi-Bridge Operator

30

  • Similar to Bridge, except that the

returned nodes are only those that are connected to a certain percentage of the nodes of the given set

Mbridge(nodes_set, time-pattern, confidante, percentage)

slide-31
SLIDE 31

Multi Bridge – Example I

31

A = select(Nsocial, color = yellow) B = extend(A, 2) Mbridge(B, ‘every morning’, ,0.8, 50%)

The operator can be used to discover groups with socio- spatial similarity

slide-32
SLIDE 32

Multi Bridge – Example II

32

John is searching for new friends to go out with

FriendsOfJohn = extend(select(Nsocial, name=‘John’), 1) Returns John’s friends Entertainment = select(Mbridge(FriendsOfJohn, ‘every week’, 0.8, 60%), category=‘entertainment’) Returns entertainment place where John’s friends frequently visit PotentialNewFriends = Mbridge(Entertainment , ‘every week’, 0.8, 80%) Returns people that frequently visit these places

slide-33
SLIDE 33

Queries – Example I

33

Find partners for a carpool

John lives in Downing St. and works in Heathrow airport He wants to find co-workers for a carpool

Neighborhood = extend(select(Nspatial, address like ‘%Donwning%), 100) Returns the geographic entities near John’s home Neighbors = bridge(Neighborhood , ‘every morning’, 0.8) Returns people who stay every morning in John’s neighborhood Co-workers = bridge(select(Nspatial, address like ‘% Heathrow airport %) , ‘every workday’, 0.8) Returns people that are in Heathrow airport every workday

slide-34
SLIDE 34

Queries – Example I

34

Potential = intersect(Neighbors , Co-workers) Returns potential users for John’s carpool

Find partners for a carpool

John lives in Downing St. and works in Heathrow airport He wants to find co-workers for a carpool

Neighbors = bridge(Neighborhood , ‘every morning’, 0.8) Returns people who stay every morning in John’s neighborhood Co-workers = bridge(select(Nspatial, address like ‘% Heathrow airport %) , ‘every workday’, 0.8) Returns people that are in Heathrow airport every workday

slide-35
SLIDE 35

Queries – Example II

35

Find a jogging partner for Alice

UsersInParks = brigde(ParksInAliceHood, ‘mornings in the week’ , 0.6) Returns people that spend time during the mornings in parks at Alice’s neighborhood ParksInAliceHood= select(extend(select(Nspatial, address = Alice_address), 1000), type = park) Returns the parks in Alice’s neighborhood

slide-36
SLIDE 36

Queries – Example II

36

FriendsOfAlice = extend(select(Nsocial, name=‘Alice’), 2) Returns Alice’s friends PotentialPartner = intersect(FriendsOfAlice , UsersInParks ) Returns potential jogging partners

Find a jogging partner for Alice

UsersInParks = brigde(ParksInAliceHood, ‘mornings in the week’ , 0.6) Returns people that spend time during the mornings in parks at Alice’s neighborhood

slide-37
SLIDE 37

Implementation

37

Goals:

  • Demonstrate the feasibility of the model
  • Show that a socio-spatial network can be built

effectively upon common data-storage tools Two implementations

  • 1. Relational based
  • 2. Graph based
  • Experimentally compare the two

implementations

slide-38
SLIDE 38

Graph-Based Implementation

38

  • Graph database management system provides a

natural storage for the SSN

  • The implementation uses Neo4j – an open source

graph database management system, in Java

  • The SSN network is stored as a graph with

attributes on the spatial and social nodes

  • Life patterns are edges with the time pattern and

confidence as attributes

slide-39
SLIDE 39

Relational Implementation

39

The Relations

  • Users
  • Friendship
  • Geographic entities
  • Hierarchy
  • Adjacency
  • Life pattern

Geographic entities Hierarchy Friendship Life pattern Adjacency users

slide-40
SLIDE 40

Relational Implementation

40

  • The query operations are translated

to SQL queries

  • Complex queries are translated to

nested SQL queries

  • We used optimization techniques to

improve the efficiency of query evaluation

Geographic entities Hierarchy Friendship Life pattern Adjacency users

SELECT user_id FROM users WHERE name = ‘John Smith’

slide-41
SLIDE 41

Experiments

41

extend(N, name = ‘john’, 3)

10000 20000 30000 40000 50000 60000 70000 2000 4000 6000 8000 10000 12000 14000 16000 Run time (milisec) Number of users

Extend 3

Neo4j W/O Cache Neo4j With Cache MySQL W/O Cache MySQL With Cache

Large effect of a cache on the running time, in the relational-based implementation Almost no effect of the cache on the running time, in the graph-based implementation

slide-42
SLIDE 42

Experiments

42

bridge(bridge(bridge(select(N, name=‘John’),*,0.8),*,0.8 ),*,0.8)

10000 20000 30000 40000 50000 60000 70000 200000 400000 600000 800000 Run time (milisec) Number of life patterns

Bridge 3

Neo4j W/O Cache Neo4j With Cache MySQL W/O Cache MySQL With Cache

The graph model shows better result for large datasets In both models the cache significantly improves the efficiency

slide-43
SLIDE 43

Experiments

43

Paramedics = Select(Nsocial, occupation=`Paramedics') Query_1 = Bridge(Paramedics, `some_night_of_the_week',0.85)

Find where paramedics might live Queries with bridge are evaluated more efficiently over the graph DBMS than

  • ver the relational

DBMS

200 400 600 800 1000 1200 1400 1600 200000 400000 600000 800000 1000000 Run time (milisec) Network size

Query 1

Neo4j W/O Cache Neo4j With Cache MySQL W/O Cache MySQL With Cache

slide-44
SLIDE 44

Experiments

44

John = Select(Nsocial, name=`John') Places = Bridge(John, all, 0.5) Query_2 = MBridge(Places, all, 0.5, 20%)

500 1000 1500 2000 2500 3000 200000 400000 600000 800000 1000000 Run time (milisec) Network size

Query 2

Neo4j W/O Cache Neo4j With Cache MySQL W/O Cache MySQL With Cache

Find people that visit in 20% or more at the same places as John The evaluation of Mbridge is more efficient over RDBMS than over graph DBMS

slide-45
SLIDE 45

Conclusions

45

  • We presented a model for representing the

integration of social networks with spatial networks

  • We provided an algebra that allows querying the

combined networks, effectively and efficiently

  • We compared an implementation of the

proposed model over graph DBMS and RDBMS, and we illustrated the superiority of the graph DBMS over the RDBMS, for all the operators except the multi-bridge

slide-46
SLIDE 46

Thank You

Questions?

46