Random Walk-based Large Graph Mining Exploiting Real-world Graph - - PowerPoint PPT Presentation

random walk based large graph mining exploiting real
SMART_READER_LITE
LIVE PREVIEW

Random Walk-based Large Graph Mining Exploiting Real-world Graph - - PowerPoint PPT Presentation

Ph.D. Dissertation Defense Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties


slide-1
SLIDE 1

Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

Jinhong Jung Ph.D. Candidate

  • Dept. of Computer Science & Engineering

Seoul National University Ph.D. Dissertation Defense

실세계 실세계 그래프 그래프 특징을 특징을 활용한 활용한 랜덤 랜덤 워크 워크 기반 기반 대규모 대규모 그래프 그래프 마이닝 마이닝

slide-2
SLIDE 2

Thesis Committee

문봉기 교수님

서울대학교 컴퓨터공학부 (심사위원장)

강 유 교수님

서울대학교 컴퓨터공학부 (부심사위원장)

김형주 교수님

서울대학교 컴퓨터공학부

이영기 교수님

서울대학교 컴퓨터공학부

김상욱 교수님

한양대학교 컴퓨터공학부

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

2

slide-3
SLIDE 3

Outline

n Overview n Proposed Methods n Future Works n Conclusion

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

3

slide-4
SLIDE 4

Graphs are Everywhere!

n Numerous real-world phenomena are

represented as graphs!

q Important to analyze such graphs

n 1) Gain a better understanding of real-world events n 2) Develop beneficial applications on top of the insight

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

4

Social Network Hyperlink Network Protein Interaction Network

slide-5
SLIDE 5

Random Walk in Graphs

n Random walk has been extensively

utilized to analyze real-world graph data

q Random Walk with Restart (RWR)

n Random walk: moves to one of neighbors n Restart: jumps back to query node s

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

5

Random walk (with prob. 1 − #) Restart (with prob. #)

$ $ Restart probability

slide-6
SLIDE 6

Random Walk with Restart (1)

n Input and Output of RWR

q Single-source Random Walk with Restart q Provides a personalized node ranking

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

6

Input: an adjacency matrix 𝑩 & query node 𝑡 Output: a ranking vector 𝒔 w.r.t. 𝑡

Query node

Nearby nodes, higher scores More red, more relevant

[Tong et al., ICDM’06]

slide-7
SLIDE 7

Random Walk with Restart (2)

n RWR is a fundamental building block on

various graph mining applications

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

7 Well reflect multi-facet relationships with considering global network topology

q Applications

n Node Ranking n Node embedding n Link Prediction n Recommendation n Anomaly detection n Community detection n Subgraph mining n Image segmentation Random surfer Lengths Multiple connections Degrees

slide-8
SLIDE 8

Technical Challenges (1)

n Real-world graphs are massive!

q e.g., Wikipedia has 40 million articles, and

Facebook has 2.41 billion users

q Limitations of previous methods for RWR

n Exact methods ⇒ suffer from speed & scalability n Approximate methods ⇒ too degraded quality n Top-𝑙 methods ⇒ limited applications

n Extremely challenging to satisfy all of

speed, scalability, and exactness

q For computing single-source RWR scores in

such large-scale graphs

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

8

slide-9
SLIDE 9

Technical Challenges (2)

n Real-world graphs are rich in information!

q Various labels to represent complicated

relationships between nodes

q Traditional random surfer does not consider such

labels ⇒ Lose the identity of a labeled graph

n How to reflect such labels into random walk?

q What do the labels mean for random walk?

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

9

Signed Networks +

trust

  • distrust

Knowledge Bases Traditional Random Walk

?

+ + −

slide-10
SLIDE 10

Research Goals and Importance

n Research Goals

q G1. To devise fast, scalable, and exact methods

for random walk in billion-scale graphs

q G2. To design effective random walk models

utilizing label data in labeled graphs

n Research Importance

q I1. Advance our understanding of handling large

graphs & random walk on labeled graphs

q I2. Enable us to analyze large-scale graphs q I3. Lead to novel & high-quality applications

based on random walk in labeled graphs

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

10

slide-11
SLIDE 11

Research Problems (1)

n P1. Fast, scalable & exact RWR computation

in large-scale graphs

q To develop a novel & in-memory algorithm working

  • n a single machine

n Input graph and intermediate data are stored in memory

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

11

Input: an adjacency matrix ! & query node " Output: a ranking vector # w.r.t. "

Query node

Nearby nodes, higher scores More red, more relevant

[Tong et al., ICDM’06]

slide-12
SLIDE 12

Research Problems (2)

n P2. Random walk in signed networks

q Effective for personalized node ranking

n Input: Signed network 𝐻 (each edge has + or − sign) having 𝑜

nodes & Query (or seed) node 𝑡

n Output: Trustworthiness (ranking) scores 𝒔 ∈ ℝ! of all nodes

w.r.t. seed node 𝑡

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

12

(+/− sign)

Query user

trustful distrustful

slide-13
SLIDE 13

Research Problems (3)

n P3. Random walk in edge-labeled graphs

q Each edge has one of 𝐿 categorical labels q Effective for relational reasoning b.t.w. two nodes

n Input: Edge-labeled graph 𝐻 (each edge has one

  • f 𝐿 categorical labels) & Two nodes 𝑡 and 𝑢

n Output: 𝐿 relevance scores on 𝑢 w.r.t 𝑡

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

13 𝑡 𝑢

slide-14
SLIDE 14

Main Approaches

n A1. Real-world Graph Properties

q e.g., Power-law degree distribution / balance theory

n A2. Numerical Computing Methods

q To boost the computational speed on adjacency matrices

n A3. Linear Algebra & Stochastic Process

q To design new random walk models in labeled graphs

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

14

hubs

Before After

[Kang et al., ICDM’11] − − − − − + − + + + + + Balanced Unbalanced

slide-15
SLIDE 15

Outline

n Overview n Proposed Methods n Future Works n Conclusion

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

15

slide-16
SLIDE 16

Proposed Methods

n Random Walk-based Large Graph Mining

Exploiting Real-world Graph Properties

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

16

Plain Graphs (No edge labels) Fast Scalable & Exact RWR in Billion-scale Graphs BePI [SIGMOD’17] Signed Graphs (Two edge labels) Random Walk in Signed Graphs: Personalized Ranking SRWR [ICDM’16] [KAIS’19]

Edge-labeled Graphs

(𝑳 edge labels) Random Walk in Edge-labeled Graphs: Relational Reasoning MuRWR [WWWJ’20] Current Works (Ph.D. Course)

slide-17
SLIDE 17

Proposed Methods

n Random Walk-based Large Graph Mining

Exploiting Real-world Graph Properties

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

17

Plain Graphs (No edge labels) Fast Scalable & Exact RWR in Billion-scale Graphs BePI [SIGMOD’17] Signed Graphs (Two edge labels) Random Walk in Signed Graphs: Personalized Ranking SRWR [ICDM’16] [KAIS’19]

Edge-labeled Graphs

(𝑳 edge labels) Random Walk in Edge-labeled Graphs: Relational Reasoning MuRWR [WWWJ’20] Current Works (Ph.D. Course)

slide-18
SLIDE 18

Introduction

n Problem: Random Walk with Restart

q Input: Adjacency matrix 𝐁 of a graph having 𝑜

nodes & Query (or seed) node s

q Output: Relevance (ranking) scores 𝒔 ∈ ℝ* of all

nodes w.r.t. seed node 𝑡

q In-memory computation on a single machine

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

18

q Recursive Equation

n 𝐬 = 1 − 𝑑 -

𝐁𝐔𝐬 + 𝑑𝐫'

q 𝑑 is called restart probability

q Linear System

n

𝐉 − 1 − 𝑑 - 𝐁𝐔 𝐬 = 𝑑𝐫' ⇔ 𝐈𝐬 = 𝑑𝐫'

← Query vector (s-th unit vector)

Random Walk Restart

slide-19
SLIDE 19

Challenges

n Q. How to compute exact RWR scores quickly

  • n very large graphs?

q Iterative Methods iteratively update RWR scores

until convergence

n e.g., power iteration: 𝐬($) ← 1 − 𝑑 &

𝐁𝐔𝐬($'() + 𝑑𝐫)

n Pros: scale to very large-graphs ⇐ 𝑃(𝑛) space n Cons: slow query speed ⇐ 𝑃 𝑈𝑛 query time

q Preprocessing Methods compute RWR scores

directly from precomputed data

n e.g., matrix inversion: 𝐬 = 𝑑𝐈'(𝐫) where 𝐈 = (𝐉 − 1 − 𝑑 &

𝐁𝐔)

n Pros: fast query speed ⇐ 𝑃(𝑜) query time n Cons: cannot handle very large graphs ⇐ 𝑃(𝑜!) prep. time

𝑃(𝑜") space

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

19

𝑈: # of iterations 𝑛: # of edges 𝑜: # of nodes

slide-20
SLIDE 20

Why Important?

n I1) Why Fast & Scalable RWR computation?

q Improve computational performance of various

applications based on RWR in large graphs

n I2) Why exact RWR computation?

q Existing approximate methods dramatically

degrade the quality of applications using RWR

n I3) Why all nodes’ scores w.r.t. seed?

q Previous top-𝑙 approaches focus on getting

top-𝑙 nodes, not their scores

q Lots of applications still rely on the scores of all

nodes ⇒ e.g., anomaly detection, local clustering, subgraph mining

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

20

slide-21
SLIDE 21

Proposed Method: BePI (1)

n BePI (Best of Preprocessing and Iterative approaches)

q A fast and scalable method by taking the advantages

  • f both preprocessing and iterative approaches

n Key Ideas

q Idea 1) Exploit real-world graph structures

to make it easy-to-preprocess

q Idea 2) Incorporate an iterative method

to increase the scalability

q Idea 3) Optimize the performance of the iterative

method to accelerate RWR computation speed

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

21

slide-22
SLIDE 22

Real-world Graph Properties

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

22

Source Destination

Non-deadends Deadends

𝐈 = (𝐉 − 1 − 𝑑 ) 𝐁𝐔)

Deadend

Deadend

n

Deadend is a node having no out-going edges, e.g., an image in a web-document graph

Hub-and-spoke

n

Hubs are high degree nodes, spokes are low degree nodes

n

Few hubs, and a majority of spokes in real-world graphs

hubs

[Kang et al., ICDM’11] [Langville et al., JSC’06]

Ratio of deadend: 5~40% Ratio of hub: 5~20%

slide-23
SLIDE 23

Proposed Method: BePI (2)

n Idea 1) Exploit real-world graph structures

to make it easy-to-preprocess

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

23

Deadend Hub & Spoke

  • n 𝐈,,

𝐈𝐬 = 𝑑𝐫, ⇔ 𝐈-- 𝐈-. 𝟏 𝐈.- 𝐈.. 𝟏 𝐈/- 𝐈/. 𝐉 𝐬- 𝐬. 𝐬/ = 𝑑 𝐫- 𝐫. 𝐫/

𝐈"" is a block diagonal matrix!

slide-24
SLIDE 24

Proposed Method: BePI (3)

n RWR is obtained by solving a linear system

  • n the reordered matrix (𝐬 = 𝑑𝐈1-𝐫,)

q Efficiently solved by handling smaller blocks

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

24

−1

𝐈--

(-)

𝐈--

(.)

𝐈--

(/)

−1 −1 −1 𝐈-.

𝐈.- 𝐈/- 𝐈/. 𝐉 𝟏 𝟏

Easy-to-invert ⇒ preprocessing

𝐈.. Idea 1 (Block Elimination)

[Boyd et al., 2009] [Shin et al., 2015]

slide-25
SLIDE 25

Proposed Method: BePI (4)

n Apply block elimination as a preprocessing

approach

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

25

Details

𝐓 = 𝐈00 − 𝐈01𝐈11

21𝐈10, the Schur complement of 𝐈11

𝐈𝐬, = 𝑑𝐫, ⇔ 𝐈-- 𝐈-. 𝟏 𝐈.- 𝐈.. 𝟏 𝐈/- 𝐈/. 𝐉 𝐬- 𝐬. 𝐬/ = 𝑑 𝐫- 𝐫. 𝐫/ 𝐬- 𝐬. 𝐬/ = 𝐈--

1-(𝑑𝐫- − 𝐈-.𝐬.)

𝐓1𝟐(𝑑𝐫. − 𝑑𝐈.-𝐈--

1-𝐫-)

𝑑𝐫/ − 𝐈/-𝐬- − 𝐈/.𝐬.

Block elimination

Precompute the blue-colored matrices to make RWR computation fast!

slide-26
SLIDE 26

Proposed Method: BePI (5)

n Idea 2) Incorporate an iterative method to

increase the scalability

q Hard to invert 𝐓 in large graphs (dim(𝐓) = # of hubs ≃ 10#) q ⇒ Solve the system on 𝐓 iteratively (GMRES)

n Idea 3) Optimize the performance of the

iterative method to accelerate RWR speed

q e.g., Preconditioning for faster convergence

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

26

Details

[Saad et al., 1986]

𝐬. = 𝐓1- 𝑑𝐫. − 𝑑𝐈.-𝐈--

1-𝐫- ⇔ 𝐓𝐬. = 6

𝐫.

≜ > 𝐫$

The sophisticated combination of these techniques leads to fast & scalable RWR with the guarantee of exactness

slide-27
SLIDE 27

Experimental Results (1)

n Experimental settings

q Machine: single machine with 500GB memory q Data: real-world large-scale graphs (up to billion-scale) q Competitors: Bear & LU (Prep.), Power & GMRES (Iter.)

n Preprocessing time

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

27

  • BePI is significantly faster

than other preprocessing methods

  • Only BePI successfully

scales to the largest graph (Friendster, 2.5B edges)

Proposed →

2.5B 500K 3M ←

slide-28
SLIDE 28

Experimental Results (2)

n Memory requirement and query time

q Competitors: Bear & LU (Prep.), Power & GMRES (Iter.)

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

28 BePI requires 𝟐𝟒𝟏× less memory space & computes RWR 𝟘× faster!

← Proposed ← Proposed

slide-29
SLIDE 29

Proposed Methods

n Random Walk-based Large Graph Mining

Exploiting Real-world Graph Properties

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

29

Plain Graphs (No edge labels) Fast Scalable & Exact RWR in Billion-scale Graphs BePI [SIGMOD’17] Signed Graphs (Two edge labels) Random Walk in Signed Graphs: Personalized Ranking SRWR [ICDM’16] [KAIS’19]

Edge-labeled Graphs

(𝑳 edge labels) Random Walk in Edge-labeled Graphs: Relational Reasoning MuRWR [WWWJ’20] Current Works (Ph.D. Course)

slide-30
SLIDE 30

Introduction

n Problem: Personalized Ranking in Signed

Networks

q Input: Signed network 𝐻 (each edge has + or − sign)

having 𝑜 nodes & Query (or seed) node 𝑡

q Output: Trustworthiness (ranking) scores 𝒔 ∈ ℝ3 of all

nodes w.r.t. seed node 𝑡

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

30

Query user

trustful distrustful

slide-31
SLIDE 31

Limitations

n Naïve approaches fail to provide proper

personalized ranking in signed network 𝐻

q RWR after removing signs from 𝐻

n ⇒ No consideration on distrustful relationships

q Modified RWR (M-RWR)

n Step 1. Split 𝐻 into 𝐻4 and 𝐻2 (i.e., 𝐻 = 𝐻4 ∪ 𝐻2) n Step 2. Positive RWR scores 𝒔4 on 𝐻4 &

Negative RWR scores 𝒔2 on 𝐻2

n Step 3. Trustworthiness scores 𝒔 = 𝒔4 − 𝒔2 n ⇒ Many connections are broken

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

31

𝑡 𝑣 𝑤 𝑢 𝑡 𝑣 𝑤 𝑢 𝑡 𝑣 𝑤 𝑢

Signed Graph 𝐻 Positive Graph 𝐻0 Negative Graph 𝐻1

It cannot reach 𝑢 on both 𝐻! & 𝐻"

slide-32
SLIDE 32

Challenges

n Q. How to deal with signed edges for

random walks?

n Importance

q Lead to proper personalized node ranking scores in signed

network (More trustful ⇒ Higher ranking)

q Enable us to effectively analyze signed networks based on

random walk (link prediction, anomaly detection, etc.)

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

32

Random surfer

Traditional random surfer

  • nly consider unsigned edges

No rules for handling signed edges

?

+ + −

slide-33
SLIDE 33

Proposed Method: SRWR (1)

n SRWR (Signed Random Walk with Restart)

q Personalized node ranking in signed networks

n Idea 1) Introduce sign into random surfer n Idea 2) Adopt balance theory to signed surfer

q The theory describes signed triangle pattern,

a distinct structure in real-world signed networks

q Two methods for SRWR

n SRWR-Iter: Iteratively computes SRWR scores n SRWR-Pre: Efficiently computes SRWR scores in

a preprocessing manner

q Idea 3) Exploit real-world graph structures

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

33

slide-34
SLIDE 34

Proposed Method: SRWR (2)

n Idea 1) Introduce a sign into a random

surfer to handle signed edges

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

34

Traditional random surfer Signed random surfer (proposed) Negative Positive

Signed Random Surfer

! + − + − + − + − + −

How to change the surfer’s sign? ⇒ Balance Theory

slide-35
SLIDE 35

Real-world Graph Properties

n Balance Theory: Real-world Signed Networks

are Balanced!

q There are 88~92% balanced triangles

q Examples

n

a) Friend of my friend is my friend! ⇒ balanced

n

b) Enemy of my friend is my friend? ⇒ unbalanced

n

c) Enemy of my friend is my enemy! ⇒ balanced

n

d) Enemy of my enemy is my enemy? ⇒ unbalanced

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

35

− − − − − + − + + + + + Balanced Balanced Unbalanced Unbalanced

slide-36
SLIDE 36

Proposed Method: SRWR (3)

n Idea 2) Adopt balance theory to the signed

surfer

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

36

Traditional random surfer Signed random surfer (proposed) Negative Positive

Signed Random Surfer Rules from Balance Theory

1) Friend of my friend is my friend 2) Enemy of my friend is my enemy 3) Friend of my enemy is my enemy 4) Enemy of my enemy is my friend

+

slide-37
SLIDE 37

Proposed Method: SRWR (4)

n Idea 2) Adopt balance theory to the signed

surfer

q Flip surfer’s sign if she encounters negative edges

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

37 Traditional random walk Cannot identify node 𝑢 Signed random walk Consistent with balance theory

slide-38
SLIDE 38

Proposed Methods: SRWR (5)

n Signed Random Walk with Restart Model

q Action 1: Signed Random Walk

n The surfer randomly moves to one of neighbors from

node 𝑣 with prob. 1 − 𝑑

n She flips her sign if she encounters a negative edge

q Action 2: Restart

n The surfer goes back to the query node 𝑡 with prob. 𝑑 n Her sign should become positive at the query node

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

38

slide-39
SLIDE 39

Example of SRWR (1)

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

39 𝑡 + − + − + − + − + − + −

Counters →

Start from query node 𝑡 Toss a biased coin

𝐼 → Signed random walk 𝑈 → Restart

Suppose 𝐼 appears

slide-40
SLIDE 40

Example of SRWR (2)

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

40 𝑡 + − + − + − + − + − + −

Do signed random walk Count it as positive visit Toss a biased coin again

𝐼 → Signed random walk 𝑈 → Restart

Suppose 𝐼 appears

slide-41
SLIDE 41

Example of SRWR (3)

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

41 𝑡 + − + − + − + − + − + −

Do signed random walk

Flip her sign due to negative edge

Count it as negative visit Toss a biased coin again

𝐼 → Signed random walk 𝑈 → Restart

Suppose 𝑈 appears

slide-42
SLIDE 42

Example of SRWR (4)

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

42 𝑡 + − + − + − + − + − + −

Do restart

Her sign becomes positive

Repeat SRWR so many times

slide-43
SLIDE 43

Example of SRWR (5)

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

43 𝑡 + − + − + − + − + − + −

Measure visit probabilities : = visit count/total # of trials Probabilities on a node are used as ranking scores

slide-44
SLIDE 44

Experimental Results (1)

n Experimental settings

q Data: real-world signed networks

n Signed Link Prediction

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

44

SRWR shows the best link prediction performance for all the datasets

Proposed 𝑡 Which nodes will be connected positively or negatively? ?

slide-45
SLIDE 45

Experimental Results (2)

n Edge Sign Prediction

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

45

proposed

SRWR outperforms

  • ther ranking models
  • Achieve best accuracy

𝑡 𝑢 ? What is the sign of the connection from 𝑡 to 𝑢?

slide-46
SLIDE 46

Experimental Results (3)

n Troll Identification in the Slashdot dataset

q Blue: query user (yagu) & Red: trolls

n The query user is ranked 1st in our trust ranking n Many trolls are ranked high in our distrust ranking

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

46

slide-47
SLIDE 47

Experimental Results (4)

n Troll Identification in the Slashdot dataset

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

47 proposed

BEST BEST ↑ SRWR SRWR ↓ BEST BEST

SRWR captures trolls better than other ranking models!

slide-48
SLIDE 48

Proposed Method: SRWR (6)

n Idea 3) Exploit real-world graph structures

for SRWR-Pre (Prep. Method for SRWR)

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

48

slide-49
SLIDE 49

Experimental Results

n Experimental settings

q Machine: single machine with 500GB memory q Data: real-world signed networks

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

49

SRWR-Pre requires 𝟐𝟐× less memory space & computes SRWR 𝟐𝟓× faster! Proposed →

(Prep.) (Iter.)

slide-50
SLIDE 50

Proposed Methods

n Random Walk-based Large Graph Mining

Exploiting Real-world Graph Properties

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

50

Plain Graphs (No edge labels) Fast Scalable & Exact RWR in Billion-scale Graphs BePI [SIGMOD’17] Signed Graphs (Two edge labels) Random Walk in Signed Graphs: Personalized Ranking SRWR [ICDM’16] [KAIS’19]

Edge-labeled Graphs

(𝑳 edge labels) Random Walk in Edge-labeled Graphs: Relational Reasoning MuRWR [WWWJ’20] Current Works (Ph.D. Course)

slide-51
SLIDE 51

Introduction

n Problem: Relational Reasoning in Edge-

labeled Graphs

q Input: Edge-labeled graph 𝐻 (each edge has one

  • f 𝐿 categorical labels) & Two nodes 𝑡 and 𝑢

q Output: 𝐿 relevance scores on 𝑢 w.r.t 𝑡 q Importance: increase KB’s quality via knowledge

completion ⇒ helpful for applications based on KB

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

51 𝑡 𝑢

slide-52
SLIDE 52

Limitation & Challenge

n RWR can capture diverse relationship

between two nodes

q Multiple connections considering quality

n Multi-hops/degree/weight…

q But it cannot consider edge labels!

n How to reflect such labels into random walk?

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

52

The surfer in RWR cannot identify the relation between the nodes! Trajectory of the random surfer

slide-53
SLIDE 53

Proposed Method: MuRWR (1)

n MuRWR (Multi-Labeled Random Walk with Restart)

q Random walk-based model for relevance scores

in edge-labeled graphs

n Key Ideas

q Idea 1) Introduce a labeled random surfer

n

Whose label at a node indicates the inferred relation

q Idea 2) Allow the surfer to change her label

during random walk with some rules

q Idea 3) Exploit a data-driven approach to extract

knowledge from a graph so that the surfer learns the rules

q To sum up, MuRWR is the generalization of SRWR!

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

53

𝑳 labels on edges 𝟑 labels on edges

slide-54
SLIDE 54

Proposed Method: MuRWR (2)

n MuRWR (Multi-Labeled Random Walk with Restart)

q Random walk-based model for relevance scores

in edge-labeled graphs

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

54

How the labeled surfer walks along the path from 𝑡 to 𝑢 Labeled triangles used for the rules

  • n how to change the surfers’ label

Syllogism Knowledge

𝐻

Data Driven

slide-55
SLIDE 55

Experimental Results

n Experimental settings

q Data: real-world edge-labeled graphs q Applications: relational reasoning

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

55

MuRWR shows the best accuracy among all tested methods!

𝑡 𝑢 ? What is the relation of the connection from 𝑡 to 𝑢? Proposed →

𝐿 = 2 𝐿 = 18

6% 4% 5% 1% 2% 2% Improvement

slide-56
SLIDE 56

Outline

n Overview n Proposed Methods n Future Works n Conclusion

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

56

slide-57
SLIDE 57

Future Works

n Further extend our approach exploiting distinct

properties in real-world data

q 1) To develop a method for fast & accurate SVD based

pseudoinverse computation

q 2) To design a method for fast & scalable signed network

generation following real-world properties

q 3) To make our methods working on graph databases or

distributed systems

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

57

F1) Reordering for Rectangular Matrix F2) Simulation of Balanced Structure F3) Graph DB & Distributed processing

slide-58
SLIDE 58

Outline

n Overview n Proposed Methods n Future Works n Conclusion

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

58

Outline

n Overview n Proposed Methods n Future Works n Conclusion

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

31

slide-59
SLIDE 59

Conclusion

n Random Walk-based Large Graph Mining

Exploiting Real-world Graph Properties

q G1. To devise fast, scalable, and exact methods for

random walk in large-scale graphs

q G2. To design effective random walk models utilizing

label data in labeled graphs

n Approach: to exploit real-world graph properties

Dec 16

59

Plain Graphs (No edge labels)

Fast Scalable & Exact RWR in Billion-scale Graphs BePI

[SIGMOD’17]

Signed Graphs (Two edge labels)

Random Walk in Signed Graphs: Personalized Ranking SRWR

[ICDM’16] & [KAIS’19]

Edge-labeled Graphs (𝑳 edge labels)

Random Walk in Edge-labeled Graphs: Relational Reasoning MuRWR

[WWWJ’20]

Current Works (Ph.D. Course) Deadend Structure Hub-and-Spoke Structure Signed Triangle Patterns Hub-and-Spoke Structure Labeled Triangle Patterns (Syllogism Knowledge)

slide-60
SLIDE 60

Dec 16 Random Walk-based Large Graph Mining Exploiting Real-world Graph Properties

60

Thank You!

  • Q & A