Mining Rich Graphs Ranking, Classification, and Anomaly Detection - PowerPoint PPT Presentation

Mining Rich Graphs Ranking, Classification, and Anomaly Detection Leman Akoglu Feb 9 th 2018

Networks are ubiquitous! - - - Terrorist Network Food Web Internet Map [Krebs 2002] [2007] [Koren 2009] Social Network Protein Network [Newman 2005] [Salthe 2004] Web Graph 2

Graph problems - • ranking, ¡ - • classifica-on, ¡ - • clustering ¡& ¡ anomaly ¡mining, ¡ • link ¡predic-on, ¡ ¡ Terrorist Network Food Web Internet Map [Krebs 2002] [2007] [Koren 2009] • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ • evolu-on, ¡ Social Network Protein Network • … ¡ ¡ [Newman 2005] [Salthe 2004] Web Graph 3

Ranking in networks Src: wiki/PageRank 4

Src: [Adamic+ 2005] Classification in networks 5

Community detection in networks Src: [McAuley&Leckovec 2012] 6

Rich networks 7

Rich networks also ubiquitous! Read the Web 8

Graph problems on rich networks • ranking, ¡ • clustering ¡& ¡ anomaly ¡mining, ¡ • classifica-on, ¡ • link ¡predic-on, ¡ ¡ • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ Read the Web • evolu-on, ¡ • … ¡ ¡ 9

Graph problems on rich networks • ranking, ¡ • clustering ¡& ¡ anomaly ¡mining, ¡ ¡ • classifica-on, ¡ • link ¡predic-on, ¡ ¡ • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ Read the Web • evolu-on, ¡ • … ¡ ¡ 10

Ranking in rich networks: example Medical referral network (weighted, directed) 11

Ranking in rich networks: example Medical referral network + physician expertise 12

Ranking in rich networks: example Town B Town A Medical referral network + physician expertise + location 13

Ranking in rich networks Town B Town A Ranking Problem: Which are the top k nodes of a certain type? e.g.: Who are the best cardiologists in the network, in my town, etc.? Ranking in Heterogeneous Networks with Geo-Location Information Abhinav Mishra & Leman Akoglu SIAM SDM 2017. 14

Modeling the ranking problem Goal : ranking in directed heterogeneous information networks (HIN) with geo-location n HINside model 1. Relation strength 2. Relation distance 3. Neighbor authority 4. Authority transfer rates 5. Competition v Closed form solution n Parameter estimation 15

HINside model Relation Strength and Distance q edge weights ⇥ denote the log( w ( i, j ) + 1) . where W ( i, j ) = distance matrix such that q pair-wise distances ⇥ that D ( i, j ) = log( d ( l i , l j ) + 1) . for the relation distance, we combine M = W � D (3.1)

HINside model i In-neighbor authority X r i = M ( j, i ) r j (3.2) j ∈ V r i : authority score of node i i Authority Transfer Rates (ATR) X r i = Γ ( t j , t i ) M ( j, i ) r j . (3.3) j ∈ V t i : type of node i

HINside model other nodes of type t i in the vicinity of node j Competition j i ⇢ g ( d ( l u , l v )) u, v 2 V , u 6 = v N ( u, v ) = 0 u = v e.g. g ( z ) = e � z . for monotonically decreasing the authority scores X X (3.4) r i = Γ ( t j , t i ) M ( j, i ) ( r j + N ( v, j ) r v ) v : t v = t i j

Closed-form n Authority scores vector r written in closed form (& computed by power iterations) as : L 0 + ( L 0 N 0 � E ) ⇥ ⇤ r = r = H r 2 V 8 define L = M � ( T Γ T 0 ) q T ( i, c ) = 1 if t i = T ( c ) Let T denote (n x m) where n Γ ( (m x m) authority transfer rates (ATR) n ⇢ 1 ⇢ if t u = t v q where E ( u, v ) = 0 otherwise form, E = TT 0 . X n: #nodes m: #types 19

Modeling the ranking problem Goal : ranking in directed heterogeneous information networks (HIN) with geo-location n HINside model 1. Relation strength 2. Relation distance 3. Neighbor authority 4. Authority transfer rates 5. Competition v Closed form solution n Parameter estimation 20

Parameter estimation n HINside’s parameters consist of the m 2 authority transfer rates (ATR) X X (3.4) r i = Γ ( t j , t i ) M ( j, i ) ( r j + N ( v, j ) r v ) v : t v = t i j q r i as a vector-vector product X X X ⇥ ⇤ r i = Γ ( t, t i ) M ( j, i )( r j + N ( v, j ) r v ) v : t v = t i t j : t j = t X (4.8) r i = Γ ( t, t i ) X ( t, i ) = t i ) = Γ 0 ( t i , :) · X (: , i ) = Γ 0 t i · x i of a feature vector x i and r i = f ( x i ) = < w , x i > . representation to be used 21

An alternating optimization scheme: estimate Γ ( Γ ( r X n Given : graph G, (partial) lists ranking a subset of for exactly Output: nodes of a certain type 1: Γ 0 ( Output: } , k = 0 q Randomly initialize , 1: Γ 0 ( q Compute authority scores r using repeat q Repeat X k X ← ß compute feature vectors using r n Γ k +1 X ← ß ß learn new parameters by learning-to-rank n Γ k +1 n compute authority scores r using q Until convergence 22

An alternating optimization scheme: estimate Γ ( Γ ( r X n Given : graph G, (partial) lists ranking a subset of for exactly Output: nodes of a certain type 1: Γ 0 ( Output: } , k = 0 q Randomly initialize , 1: Γ 0 ( q Compute authority scores r using repeat q Repeat X k X ← ß compute feature vectors using r n Γ k +1 ß learn new parameters by learning-to-rank X ← n Γ k +1 n compute authority scores r using q Until convergence 23

RankSVM formulation Cross-entropy based objective v Given partial ranked lists; by gradient descent nodes) ( u, v ) q create all pairs otherwise. As a result, training d ) , y d ) } |D| of { (( x 1 d , x 2 d =1 , q add training data feature vectors that belong if u ranked ahead of v instance (( x u , x v ) , 1) nodes) ( u, v ) in ), and otherwise instance (( x u , x v ) , ) , − 1) in the ), and q for each type t, solve: X || Γ t || 2 min 2 + � ✏ d Γ t d 2 D t ( x 1 d − x 2 s.t. Γ 0 d ) y d ≥ 1 − ✏ d , ∀ d ∈ D and t x 1 d = t d , t x 2 ✏ d ≥ 0 , ∀ d ∈ D Γ t ( c ) ≥ 0 , ∀ c = 1 , . . . , m

Graph problems on rich networks • ranking, ¡ • clustering ¡& ¡ anomaly ¡mining, ¡ ¡ • classifica-on, ¡ • link ¡predic-on, ¡ ¡ • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ Read the Web • evolu-on, ¡ • … ¡ ¡ 25

Attributed graphs Attributed graph: each node has 1+ properties skater data Teenager scientist Adult telemarketer doctor 26

Communities in rich networks Attributed graph: each node has 1+ properties 27

Anomalous subgraphs Given a set of attributed subgraphs* (e.g. Google+ circles), Find poorly-defined ones * social circles, communities, egonetworks, … 28

Communities in attributed networks Given an attributed subgraph*, how to quantify its quality? * social circles, communities, egonetworks, … 29

Communities in attributed networks v Given a subgraph, how to quantify its quality? 30

Communities in attributed networks v Given a subgraph, how to quantify its quality? q Structure-only n Internal measures q e.g. average degree 31

Communities in attributed networks v Given a subgraph, how to quantify its quality? q Structure-only n Internal-only q average degree n Boundary-only q cut edges n Internal + Boundary q conductance 32

Communities in attributed networks v Given an attributed subgraph, how to quantify its quality? q Structure-only n Internal-only q average degree n Boundary-only q cut edges n Internal + Boundary q conductance q Structure + Attributes? Scalable Anomaly Ranking of Attributed Neighborhoods Bryan Perozzi and Leman Akoglu SIAM SDM 2016. 33

What’s an Anomaly, Anyhow? v Given an attributed subgraph how to quantify quality? high low 34

Normality (intuition) n Given an attributed subgraph how to quantify quality? q Internal n structural density high low 35

Normality (intuition) n Given an attributed subgraph how to quantify quality? q Internal n structural density AND n attribute coherence high v neighborhood “focus” chess biking low 36

Normality (intuition) n Given an attributed subgraph how to quantify quality? q Internal n structural density AND n attribute coherence high v neighborhood “focus” q Boundary n structural sparsity, OR n external separation low v “exoneration” 37

Normality (intuition) n Motivation: [Leskovec+ ‘08] q no good cuts in real-world graphs [McAuley+ ‘14] q social circles overlap n “exoneration” : by (a) null model, (b) attributes separable by edges expected, not surprising different “focus” (b) neighborhood overlap (a) hub effect 38

The measure of Normality A ij − k i k j X � � N = I + E = s ( x i , x j | w ) 2 m i ∈ C,j ∈ C 1 1 − min(1 , k i k b X � � 2 m ) s ( x i , x b | w ) (3.4) − i ∈ C,b ∈ B ( i,b ) ∈ E 39 Leman Akoglu

The measure of Normality Null model A ij − k i k j X � � N = I + E = s ( x i , x j | w ) 2 m i ∈ C,j ∈ C 1 1 − min(1 , k i k b similarity X � � internal 2 m ) s ( x i , x b | w ) (3.4) − “focus” vector consistency i ∈ C,b ∈ B ( i,b ) ∈ E chess biking 40

Mining Rich Graphs Ranking, Classification, and Anomaly Detection - PowerPoint PPT Presentation

Mining Rich Graphs Ranking, Classification, and Anomaly Detection Leman Akoglu Feb 9 th 2018 Networks are ubiquitous! - - - Terrorist Network Food Web Internet Map [Krebs 2002] [2007] [Koren 2009] Social Network Protein Network

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Senate Meeting October 5, 2011 1. Call To Order Call T o Order 2. Orientation 3. Approval of

Hypertext Markup Language E L E M E N T S C O N T E N T 2 2 1 9/16/20 S A M P L E H T M L

CS 105: COLLECTION TYPES Max Fowler (Computer Science)

CSCI 4152/6509 Natural Language Processing Lab 2: Perl Tutorial 2 Lab Instructor: Dijana

?'36'&6.':6&-%32'"3"8.%5*' J

Evaluating Centering for Information Ordering Using Corpora M.Sc. Seminar: Discourse Coherence

English Acquisition IA k , IIA f , 2011 12 ( 14 ) ( )

Promoting Innovation-Led and Technology-Driven SMEs By: Fadzilah Ahmad Din SMI DEC ASI A PACI

Mining Rich Graphs Ranking, Classification, and Anomaly Detection - PowerPoint PPT Presentation

Mining Rich Graphs Ranking, Classification, and Anomaly Detection Leman Akoglu Feb 9 th 2018 Networks are ubiquitous! - - - Terrorist Network Food Web Internet Map [Krebs 2002] [2007] [Koren 2009] Social Network Protein Network

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Senate Meeting October 5, 2011 1. Call To Order Call T o Order 2. Orientation 3. Approval of

Hypertext Markup Language E L E M E N T S C O N T E N T 2 2 1 9/16/20 S A M P L E H T M L

CS 105: COLLECTION TYPES Max Fowler (Computer Science)

CSCI 4152/6509 Natural Language Processing Lab 2: Perl Tutorial 2 Lab Instructor: Dijana

?*'36'&amp;6.':6&amp;-%3*2'&quot;3&quot;8.%5*' J

Evaluating Centering for Information Ordering Using Corpora M.Sc. Seminar: Discourse Coherence

English Acquisition IA k , IIA f , 2011 12 ( 14 ) ( )

Promoting Innovation-Led and Technology-Driven SMEs By: Fadzilah Ahmad Din SMI DEC ASI A PACI

?'36'&6.':6&-%32'"3"8.%5*' J