Fast and Accurate Mining of Evolving & Trajectory Networks - - PowerPoint PPT Presentation

fast and accurate mining of
SMART_READER_LITE
LIVE PREVIEW

Fast and Accurate Mining of Evolving & Trajectory Networks - - PowerPoint PPT Presentation

Fast and Accurate Mining of Evolving & Trajectory Networks Manos Papagelis York University, Toronto, Canada Current Research focus A. Network Representation Learning B. Trajectory Network Mining C. Streaming & Dynamic Graphs D.


slide-1
SLIDE 1

Fast and Accurate Mining of Evolving & Trajectory Networks

Manos Papagelis

York University, Toronto, Canada

slide-2
SLIDE 2
  • A. Network Representation Learning
  • B. Trajectory Network Mining
  • C. Streaming & Dynamic Graphs
  • D. Social Media Mining & Analysis
  • E. City Science / Urban Informatics / IoT
  • F. Natural Language Processing

Current Research focus

slide-3
SLIDE 3

EvoNRL: Evolving Network Representation Learning Based

  • n Random Walks

Joint work with Farzaneh Heidari

slide-4
SLIDE 4

(universal language for describing complex data)

networks

slide-5
SLIDE 5

Classical ML Tasks in Networks

? ?

node classification

? ? ?

link prediction community detection anomaly detection

?

graph similarity triangle count

Limitations of Classical ML:

  • expensive computation (high dimension computations)
  • extensive domain knowledge (task specific)
slide-6
SLIDE 6

Network Representation Learning (NRL)

several network structural properties can be learned/embedded (nodes, edges, subgraphs, graphs, …)

Low-dimension space Network

Premise of NRL:

  • faster computations (low dimension computations)
  • agnostic domain knowledge (task independent)
slide-7
SLIDE 7

Random Walk-based NRL

1 2 3 4 5 6 1 7 8 9

Feed sentences to a Skip-gram NN model 4 5 3 1 6 7 8 9 2

1 3 5 8 7 6 4 5 2 1 3 5 8 7 6 5 . . . . . . . . 87 8 5 4 3 5 6 7 88 4 5 6 7 8 9 89 2 1 3 5 6 7 8 90 7 4 2 1 3 5 6

Input network Obtain a set of random walks Treat the set of random walks as sentences Learn a vector representation for each node

1 2 3 4 5 6 1 7 8 9

StaticNRL

(DeepWalk, node2vec, …)

slide-8
SLIDE 8

but real-world networks are constantly evolving

slide-9
SLIDE 9

Evolving Network Representations Learning

slide-10
SLIDE 10

Naive Approach

1 2 3 4 5 6 1 7 8 9

4 5 3 1 6 7 8 9 2

t = 0

1 2 3 4 5 6 1 7 8 9

4 5 3 1 6 7 8 9 2

1 2 3 4 5 6 1 7 8 9

4 5 3 1 6 7 8 9 2

t = 1 t = 2 StaticNRL StaticNRL StaticNRL

Impractical (expensive, incomparable representations)

slide-11
SLIDE 11

EvoNRL Key Idea

1 2 3 4 5 6 1 7 8 9

Feed sentences to a Skip-gram NN model 4 5 3 1 6 7 8 9 2

1 3 5 8 7 6 4 5 2 1 3 5 8 7 6 5 . . . . . . . . 87 8 5 4 3 5 6 7 88 4 5 6 7 8 9 89 2 1 3 5 6 7 8 90 7 4 2 1 3 5 6

Input network Obtain a set of random walks Treat the set of random walks as sentences Learn a vector representation for each node

1 2 3 4 5 6 1 7 8 9

dynamically maintain a valid set of random walks for every change in the network

slide-12
SLIDE 12

Example: Edge Addition

7 1 2 3 4 5 6 1 7 8 9

t = 0 t = 1

1 2 3 4 5 6 7 8 9

1 3 5 8 7 6 4 5 2 1 3 5 8 7 6 5 . . . . . . . . 87 8 5 4 3 5 6 7 88 4 5 6 7 8 9 8 89 2 1 3 5 6 7 8 90 7 4 2 1 3 5 6

addition of edge (1, 4)

1 3 5 8 7 6 4 5 2 1 3 5 8 7 6 5 . . . . . . . . 87 8 5 4 3 5 6 7 88 4 5 6 7 8 9 8 89 2 1 3 5 6 7 8 90 7 4 2 1 3 5 6

need to update the RW set

1 2 3 4 1

2 1 4 3 5 6 7 8

{

simulate the rest of the RW

similarly for edge deletion, node addition/deletion

slide-13
SLIDE 13

Efficiently Maintaining a Set of Random Walks

slide-14
SLIDE 14

EvoNRL Operations

1 2 3 4 5 6 1 7 8 9

1 3 5 8 7 6 4 5 2 1 3 5 8 7 6 5 . . . . . . . . 87 8 5 4 3 5 6 7 88 4 5 6 7 8 9 89 2 1 3 5 6 7 8 90 7 4 2 1 3 5 6

1 2 3 4 5 6 7 8 9

1 3 5 8 7 6 4 5 2 1 3 5 8 7 6 5 . . . . . . . . 87 8 5 4 3 5 6 7 88 4 5 6 7 8 9 89 2 1 3 5 6 7 8 90 7 4 2 1 3 5 6 + edge(n1, n2) 2 1 4 3 5 6 7 8

Operations on RW Search a node Delete a RW Insert a new RW

need for an efficient indexing data structure

slide-15
SLIDE 15

EvoNRL Indexing

1 2 3 4 5 6 1 7 8 9

each node is a keyword each RW is a document a set of RWs is a collection of documents

1 3 5 8 7 6 4 5 2 1 3 5 8 7 6 5 . . . . . . . . 87 8 5 4 3 5 6 7 88 4 5 6 7 8 9 89 2 1 3 5 6 7 8 90 7 4 2 1 3 5 6 Term Frequency Postings and Positions 1 3 < 2, 1 >, < 89, 2 >, < 90, 4 > 2 2 <89, 1>, <90, 3> 3 5 <1, 1>, <2, 1>, <87, 3>, <89, 3>, <90, 5> 4 4 <1, 6>, <87, 3>, <90, 2> 5 9 <1, 2>, <1, 7>, <2, 3>, <2, 7>, <87, 5>, <88, 2>, <89, 4>, <90, 6> 6 6 <1, 5>, <2, 6>, <87, 6>, <88, 3>, <89, 3>, <90, 5> 7 5 <1, 4>, <2, 5>, <87, 7>, <88, 4>, <89, 6>, <90, 7> 8 5 <1, 3>, <2, 4>, <87, 1>, <88, 6>, <89, 7> 9 1 <88, 7>

slide-16
SLIDE 16

Evaluation of EvoNRL

slide-17
SLIDE 17

Evaluation: EvoNRL vs StaticNRL

Accuracy

฀ EvoNRL ≈ StaticNRL

Running Time

฀ EvoNRL << StaticNRL

slide-18
SLIDE 18

EvoNRL has similar accuracy to StaticNRL

Accuracy: edge addition

(similar results for edge deletion, node addition/deletion)

slide-19
SLIDE 19

Time Performance

EvoNRL performs orders of time faster than StaticNRL

100x 𝟑𝟏𝐲

slide-20
SLIDE 20

Takeaway

Ev EvoNR

  • NRL

time e ef efficient ent accurat ate gen ener eric met ethod

  • d

how can we learn representations of an evolving network?

slide-21
SLIDE 21
  • A. Network Representation Learning
  • B. Trajectory Network Mining
  • C. Streaming & Dynamic Graphs
  • D. Social Media Mining & Analysis
  • E. City Science / Urban Informatics / IoT
  • F. Natural Language Processing

Current Research focus

slide-22
SLIDE 22

Node Importance in Trajectory Networks

Joint work with Tilemachos Pechlivanoglou

slide-23
SLIDE 23

Trajectories of moving objects

every moving object, forms a traject jectory ry – in 2D it is a sequence of (x, y, t) there are trajectories of moving cars rs, peopl

  • ple, birds

ds, …

slide-24
SLIDE 24

trajectory anomaly detection trajectory pattern mining trajectory classification ...more

Trajectory data mining

trajectory similarity trajectory clustering

we care about network analysis of moving objects

slide-25
SLIDE 25

Proximity networks

θ θ

proximity threshold

slide-26
SLIDE 26

line of sight

Distance can represent

wifi/bluetooth signal range

slide-27
SLIDE 27

Trajectory networks

Input: put: logs of trajectories (x, y, t) in time period [0, T] Output: put: node importance metrics The Probl

  • blem

em

slide-28
SLIDE 28

Node Importance

slide-29
SLIDE 29

Node importance in static networks

Degree centrality Betweenness centrality Closeness centrality Eigenvector centrality

slide-30
SLIDE 30

Node importance in TNs

connected components over time (connectedness) node degree over time triangles over time

slide-31
SLIDE 31

infection spreading

Applications

security in autonomous vehicles rich dynamic network analytics

slide-32
SLIDE 32

Evaluation of Node Importance in Trajectory Networks

slide-33
SLIDE 33

Naive approach

For every ery discrete time unit t:

  • 1. obtain static snaps

pshot hot of the proximity network

  • 2. run st

static tic node importance algor

  • rit

ithms hms on snapshot Aggre grega gate te results at the end

slide-34
SLIDE 34

Similar to naive, but: ﹘ no fi final aggregation gregation ﹘ results calculated incremen ental tally ly at every step Still every y time unit

Streaming approach

slide-35
SLIDE 35

Every discrete time unit

...

time T

4 123

...

slide-36
SLIDE 36

Sweep Line Over Trajectories (SLOT)

slide-37
SLIDE 37

A c com

  • mpu

putatio tional al geom

  • metry

ry algorithm that given line e se segments ts computes line segment ov

  • verl

rlaps ps Efficient on

  • ne pa

pass ss algorithm that only processes line segments at the be beginn nning ing and ending ng points

Sweep line algorithm

slide-38
SLIDE 38

(algor

  • rithm

hm sk sketch) represent TN edges as time me interv ervals ls apply variatio ion of sweep line algorithm si simultan taneo eous usly ly compute node degree, triangle membership, connected components in on

  • ne pa

pass ss

SL SLOT OT: Sweep Line Over Trajectories

slide-39
SLIDE 39

Represent edges as time intervals

e1:(n1,n2)

. . .

en T edges

t1 t3 t2 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

time L

slide-40
SLIDE 40

SLOT: Sweep Line Over Trajectories

slide-41
SLIDE 41

⦁ nod

  • de degre

ree

− nodes u, v now connected − increment u, v node degrees

At every edge star art

⦁ tri riangle le me memb mbers rship ip

− did a triangle just form? − look for u, v common neighbors − increment triangle (u, v, common)

⦁ con

  • nnected

cted com

  • mpo

ponents ts

− did two previously disconnected components connect? − compare old components of u, v − if no overlap, merge them

e:(u, v) edges

t1 t2

time T

u v

slide-42
SLIDE 42

⦁ nod

  • de degre

ree

− nodes u, v now disconnected − decrement u, v degree

At every edge stop

⦁ tri riangle le me memb mbers rship ip

− did a triangle just break? − look for u, v common neighbors − decrement triangle (u, v, common)

⦁ con

  • nnected

cted com

  • mpo

ponents ts

− did a conn. compon. separate? − BFS to see if u, v still connected − if not, split component to two

t3

e:(u, v) edges

t1 t2

T time

u v

slide-43
SLIDE 43

Rich Analytics tics

− node degrees es: start/end time, duration − triangl ngles es: start/end time, duration − conne nect cted ed componen

  • nents: start/end time, duration

Exa xact results (not approximations)

SL SLOT OT: At the end of the algorithm …

slide-44
SLIDE 44

Evaluation of SLOT

slide-45
SLIDE 45

Node degree

1550x 1550x

slide-46
SLIDE 46

Triangle membership / connected components

slide-47
SLIDE 47

SLOT Scalability

slide-48
SLIDE 48

SLOT algorithm trajectory networks network importance over time SLOT properties:

  • fast
  • exact
  • scalable

Takeaway

slide-49
SLIDE 49

Seagull migration trajectories

data from Wikelski et al. 2015

slide-50
SLIDE 50

Credits

Farzaneh Heidari Tilemachos Pechlivanoglou

[IEEE Big Data 2018] Fast and Accurate Mining of Node Importance in Trajectory Networks. Tilemachos Pechlivanoglou and Manos Papagelis. Source code: https://github.com/tipech/trajectory-networks [Complex Networks 2018] EvoNRL: Evolving Network Representation Learning Based

  • n

Random Walks. Farzaneh Heidari and Manos Papagelis. Source code: https://github.com/farzana0/EvoNRL/

For r more re info fo visi sit: : Data a Mining Lab @ York rkU

slide-51
SLIDE 51

Thank you!

slide-52
SLIDE 52

Questions?