mining rich graphs
play

Mining Rich Graphs Ranking, Classification, and Anomaly Detection - PowerPoint PPT Presentation

Mining Rich Graphs Ranking, Classification, and Anomaly Detection Leman Akoglu Feb 9 th 2018 Networks are ubiquitous! - - - Terrorist Network Food Web Internet Map [Krebs 2002] [2007] [Koren 2009] Social Network Protein Network


  1. Mining Rich Graphs Ranking, Classification, and Anomaly Detection Leman Akoglu Feb 9 th 2018

  2. Networks are ubiquitous! - - - Terrorist Network Food Web Internet Map [Krebs 2002] [2007] [Koren 2009] Social Network Protein Network [Newman 2005] [Salthe 2004] Web Graph 2

  3. Graph problems - • ranking, ¡ - • classifica-on, ¡ - • clustering ¡& ¡ anomaly ¡mining, ¡ • link ¡predic-on, ¡ ¡ Terrorist Network Food Web Internet Map [Krebs 2002] [2007] [Koren 2009] • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ • evolu-on, ¡ Social Network Protein Network • … ¡ ¡ [Newman 2005] [Salthe 2004] Web Graph 3

  4. Ranking in networks Src: wiki/PageRank 4

  5. Src: [Adamic+ 2005] Classification in networks 5

  6. Community detection in networks Src: [McAuley&Leckovec 2012] 6

  7. Rich networks 7

  8. Rich networks also ubiquitous! Read the Web 8

  9. Graph problems on rich networks • ranking, ¡ • clustering ¡& ¡ anomaly ¡mining, ¡ • classifica-on, ¡ • link ¡predic-on, ¡ ¡ • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ Read the Web • evolu-on, ¡ • … ¡ ¡ 9

  10. Graph problems on rich networks • ranking, ¡ • clustering ¡& ¡ anomaly ¡mining, ¡ ¡ • classifica-on, ¡ • link ¡predic-on, ¡ ¡ • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ Read the Web • evolu-on, ¡ • … ¡ ¡ 10

  11. Ranking in rich networks: example Medical referral network (weighted, directed) 11

  12. Ranking in rich networks: example Medical referral network + physician expertise 12

  13. Ranking in rich networks: example Town B Town A Medical referral network + physician expertise + location 13

  14. Ranking in rich networks Town B Town A Ranking Problem: Which are the top k nodes of a certain type? e.g.: Who are the best cardiologists in the network, in my town, etc.? Ranking in Heterogeneous Networks with Geo-Location Information Abhinav Mishra & Leman Akoglu SIAM SDM 2017. 14

  15. Modeling the ranking problem Goal : ranking in directed heterogeneous information networks (HIN) with geo-location n HINside model 1. Relation strength 2. Relation distance 3. Neighbor authority 4. Authority transfer rates 5. Competition v Closed form solution n Parameter estimation 15

  16. HINside model Relation Strength and Distance q edge weights ⇥ denote the log( w ( i, j ) + 1) . where W ( i, j ) = distance matrix such that q pair-wise distances ⇥ that D ( i, j ) = log( d ( l i , l j ) + 1) . for the relation distance, we combine M = W � D (3.1)

  17. HINside model i In-neighbor authority X r i = M ( j, i ) r j (3.2) j ∈ V r i : authority score of node i i Authority Transfer Rates (ATR) X r i = Γ ( t j , t i ) M ( j, i ) r j . (3.3) j ∈ V t i : type of node i

  18. HINside model other nodes of type t i in the vicinity of node j Competition j i ⇢ g ( d ( l u , l v )) u, v 2 V , u 6 = v N ( u, v ) = 0 u = v e.g. g ( z ) = e � z . for monotonically decreasing the authority scores X X (3.4) r i = Γ ( t j , t i ) M ( j, i ) ( r j + N ( v, j ) r v ) v : t v = t i j

  19. Closed-form n Authority scores vector r written in closed form (& computed by power iterations) as : L 0 + ( L 0 N 0 � E ) ⇥ ⇤ r = r = H r 2 V 8 define L = M � ( T Γ T 0 ) q T ( i, c ) = 1 if t i = T ( c ) Let T denote (n x m) where n Γ ( (m x m) authority transfer rates (ATR) n ⇢ 1 ⇢ if t u = t v q where E ( u, v ) = 0 otherwise form, E = TT 0 . X n: #nodes m: #types 19

  20. Modeling the ranking problem Goal : ranking in directed heterogeneous information networks (HIN) with geo-location n HINside model 1. Relation strength 2. Relation distance 3. Neighbor authority 4. Authority transfer rates 5. Competition v Closed form solution n Parameter estimation 20

  21. Parameter estimation n HINside’s parameters consist of the m 2 authority transfer rates (ATR) X X (3.4) r i = Γ ( t j , t i ) M ( j, i ) ( r j + N ( v, j ) r v ) v : t v = t i j q r i as a vector-vector product X X X ⇥ ⇤ r i = Γ ( t, t i ) M ( j, i )( r j + N ( v, j ) r v ) v : t v = t i t j : t j = t X (4.8) r i = Γ ( t, t i ) X ( t, i ) = t i ) = Γ 0 ( t i , :) · X (: , i ) = Γ 0 t i · x i of a feature vector x i and r i = f ( x i ) = < w , x i > . representation to be used 21

  22. An alternating optimization scheme: estimate Γ ( Γ ( r X n Given : graph G, (partial) lists ranking a subset of for exactly Output: nodes of a certain type 1: Γ 0 ( Output: } , k = 0 q Randomly initialize , 1: Γ 0 ( q Compute authority scores r using repeat q Repeat X k X ← ß compute feature vectors using r n Γ k +1 X ← ß ß learn new parameters by learning-to-rank n Γ k +1 n compute authority scores r using q Until convergence 22

  23. An alternating optimization scheme: estimate Γ ( Γ ( r X n Given : graph G, (partial) lists ranking a subset of for exactly Output: nodes of a certain type 1: Γ 0 ( Output: } , k = 0 q Randomly initialize , 1: Γ 0 ( q Compute authority scores r using repeat q Repeat X k X ← ß compute feature vectors using r n Γ k +1 ß learn new parameters by learning-to-rank X ← n Γ k +1 n compute authority scores r using q Until convergence 23

  24. RankSVM formulation Cross-entropy based objective v Given partial ranked lists; by gradient descent nodes) ( u, v ) q create all pairs otherwise. As a result, training d ) , y d ) } |D| of { (( x 1 d , x 2 d =1 , q add training data feature vectors that belong if u ranked ahead of v instance (( x u , x v ) , 1) nodes) ( u, v ) in ), and otherwise instance (( x u , x v ) , ) , − 1) in the ), and q for each type t, solve: X || Γ t || 2 min 2 + � ✏ d Γ t d 2 D t ( x 1 d − x 2 s.t. Γ 0 d ) y d ≥ 1 − ✏ d , ∀ d ∈ D and t x 1 d = t d , t x 2 ✏ d ≥ 0 , ∀ d ∈ D Γ t ( c ) ≥ 0 , ∀ c = 1 , . . . , m

  25. Graph problems on rich networks • ranking, ¡ • clustering ¡& ¡ anomaly ¡mining, ¡ ¡ • classifica-on, ¡ • link ¡predic-on, ¡ ¡ • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ Read the Web • evolu-on, ¡ • … ¡ ¡ 25

  26. Attributed graphs Attributed graph: each node has 1+ properties skater data Teenager scientist Adult telemarketer doctor 26

  27. Communities in rich networks Attributed graph: each node has 1+ properties 27

  28. Anomalous subgraphs Given a set of attributed subgraphs* (e.g. Google+ circles), Find poorly-defined ones * social circles, communities, egonetworks, … 28

  29. Communities in attributed networks Given an attributed subgraph*, how to quantify its quality? * social circles, communities, egonetworks, … 29

  30. Communities in attributed networks v Given a subgraph, how to quantify its quality? 30

  31. Communities in attributed networks v Given a subgraph, how to quantify its quality? q Structure-only n Internal measures q e.g. average degree 31

  32. Communities in attributed networks v Given a subgraph, how to quantify its quality? q Structure-only n Internal-only q average degree n Boundary-only q cut edges n Internal + Boundary q conductance 32

  33. Communities in attributed networks v Given an attributed subgraph, how to quantify its quality? q Structure-only n Internal-only q average degree n Boundary-only q cut edges n Internal + Boundary q conductance q Structure + Attributes? Scalable Anomaly Ranking of Attributed Neighborhoods Bryan Perozzi and Leman Akoglu SIAM SDM 2016. 33

  34. What’s an Anomaly, Anyhow? v Given an attributed subgraph how to quantify quality? high low 34

  35. Normality (intuition) n Given an attributed subgraph how to quantify quality? q Internal n structural density high low 35

  36. Normality (intuition) n Given an attributed subgraph how to quantify quality? q Internal n structural density AND n attribute coherence high v neighborhood “focus” chess biking low 36

  37. Normality (intuition) n Given an attributed subgraph how to quantify quality? q Internal n structural density AND n attribute coherence high v neighborhood “focus” q Boundary n structural sparsity, OR n external separation low v “exoneration” 37

  38. Normality (intuition) n Motivation: [Leskovec+ ‘08] q no good cuts in real-world graphs [McAuley+ ‘14] q social circles overlap n “exoneration” : by (a) null model, (b) attributes separable by edges expected, not surprising different “focus” (b) neighborhood overlap (a) hub effect 38

  39. The measure of Normality A ij − k i k j X � � N = I + E = s ( x i , x j | w ) 2 m i ∈ C,j ∈ C 1 1 − min(1 , k i k b X � � 2 m ) s ( x i , x b | w ) (3.4) − i ∈ C,b ∈ B ( i,b ) ∈ E 39 Leman Akoglu

  40. The measure of Normality Null model A ij − k i k j X � � N = I + E = s ( x i , x j | w ) 2 m i ∈ C,j ∈ C 1 1 − min(1 , k i k b similarity X � � internal 2 m ) s ( x i , x b | w ) (3.4) − “focus” vector consistency i ∈ C,b ∈ B ( i,b ) ∈ E chess biking 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend