Topic II: Graph Mining Discrete Topics in Data Mining Universitt - PowerPoint PPT Presentation

Topic II: Graph Mining Discrete Topics in Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2012/13 T II.Intro- 1

Topic II Intro: Graph Mining 1. Why Graphs? 2. What is Graph Mining 3. Graphs: Definitions 4. Centrality 5. Graph Properties 5.1. Small World 5.2. Scale Invariance 5.3. Clustering Coefficient 6. Random Graph Models Z&M, Ch. 4 DTDM, WS 12/13 13 November 2012 T II.Intro- 2

Why Graphs? DTDM, WS 12/13 13 November 2012 T II.Intro- 3

Why Graphs? IP Networks DTDM, WS 12/13 13 November 2012 T II.Intro- 3

Why Graphs? Social Networks DTDM, WS 12/13 13 November 2012 T II.Intro- 3

Why Graphs? World Wide Web DTDM, WS 12/13 13 November 2012 T II.Intro- 3

Why Graphs? Protein–Protein Interactions DTDM, WS 12/13 13 November 2012 T II.Intro- 3

Why Graphs? Co-authorships DTDM, WS 12/13 13 November 2012 T II.Intro- 3

Why Graphs? NISZK_h QCMA SBP MA_E P^{NP[log^2]} AWPP C_=P NE NISZK MA WAPP BPE P^{NP[log]} WPP N.BPP PZK AmpP-BQP RPE BPQP UE BH TreeBQP ZPE BH_2 LWPP BPP E US NP RQP SUBEXP P^{FewP} YP compNP RBQP ZQP QP Few EP ZBQP RP EQP betaP QPLIN FewP Complexity Classes ZPP Q beta_2P UP DTDM, WS 12/13 13 November 2012 T II.Intro- 3

Why Graphs? Graphs are Everywhere! DTDM, WS 12/13 13 November 2012 T II.Intro- 3

Graphs: Definitions • An undirected graph G is a pair ( V, E ) – V = { v i } is the set of vertices – E = { e i = { v i , v j } : v i , v j ∈ V } is the set of edges • In directed graph the edges have a direction – E = { e i = ( v i , v j ) : v i , v j ∈ V } • And edge from a vertex to itself is loop – A graph that does not have loops is simple • The degree of a vertex v , d ( v ), is the number of edges attached to it, d ( v ) = |{{ v, u } ∈ E : u ∈ V }| – In directed graphs vertices have in-degree id ( v ) and out- degree od ( v ) DTDM, WS 12/13 13 November 2012 T II.Intro- 4

Subgraphs • A graph H = ( V H , E H ) is a subgraph of G = ( V, E ) if – V H ⊆ V – E H ⊆ E – The edges in E H are between vertices in V H • If V’ ⊆ V is a set of vertices, then G’ = ( V’, E’ ) is the induced subgraph if – For all v i , v j ∈ V’ such that { v i , v j } ∈ E , { v i , v j } ∈ E’ • Subgraph K = ( V K , E K ) of G is a clique if – For all v i , v j ∈ V K , { vi, vj } ∈ E K – Cliques are also called complete subgraphs DTDM, WS 12/13 13 November 2012 T II.Intro- 5

Bipartite Graphs • A graph G = ( V, E ) is bipartite if V can be partitioned into two sets U and W such that – U ∩ W = ∅ and U ∪ W = V (a partition ) – For all { v i , v j } ∈ E , v i ∈ U and v j ∈ W • No edges within U and no edges within W • Any subgraph of a bipartite graph is also bipartite • A biclique is a complete bipartite subgraph K = ( U ∪ V , E ) – For all u ∈ U and v ∈ V , edge { u , v } ∈ E DTDM, WS 12/13 13 November 2012 T II.Intro- 6

Paths and Distances • A walk in graph G between vertices x and y is an ordered sequence ⟨ x = v 0 , v 1 , v 2 , …, v t–1 , v t = y ⟩ – { v i – 1 , v i } ∈ E for all i = 1 , …, t – If x = y , the walk is closed – The same vertex can re-appear in the walk many times • A trail is a walk where edges are distinct – { v i–1 , v i } ≠ { v j–1 , v j } for i ≠ j • A path is a walk where vertices are distinct – v i ≠ v j for i ≠ j – A closed path with t ≥ 3 is a cycle • The distance between x and y, d ( x, y ) is the length of the shortest path between them DTDM, WS 12/13 13 November 2012 T II.Intro- 7

Connectedness • Two vertices x and y are connected if there is a path between them – A graph is connected if all pairs of its vertices are connected • A connected component of a graph is a maximal connected subgraph • A directed graph is strongly connected if there is a directed path between all ordered pairs of its vertices – It is weakly connected if it is connected only when considered as an undirected graph • If a graph is not connected, it is disconnected DTDM, WS 12/13 13 November 2012 T II.Intro- 8

Example v 1 v 2 v 1 v 2 v 3 v 4 v 5 v 6 v 3 v 4 v 5 v 6 v 7 v 8 v 7 v 8 (a) (b) DTDM, WS 12/13 13 November 2012 T II.Intro- 9

Adjacency Matrix • The adjacency matrix of an undirected graph G = ( V, E ) with | V | = n is the n -by- n symmetric binary matrix A with – a ij = 1 if and only if { v i , v j } ∈ E – A weighted adjacency matrix has the weights of the edges • For directed graphs, the adjacency matrix is not necessarily symmetric • The bi-adjacency matrix of a bipartite graph G = ( U ∪ V , E ) with | U | = n and | V | = m is the n -by- m binary matrix B with – b ij = 1 if and only if { u i , v j } ∈ E DTDM, WS 12/13 13 November 2012 T II.Intro- 10

Topological Attributes • The weighted degree of a vertex v i is d ( v i ) = ∑ j a ij • The average degree of a graph is the average of the degrees of its vertices, Σ i d ( v i )/ n – Degree and average degree can be extended to directed graphs • The average path length of a connected graph is the average of path lengths between all vertices ✓ n ◆ 2 ∑ i ∑ n ( n − 1 ) ∑ i ∑ d ( v i , v j ) / = d ( v i , v j ) 2 j > i j > i DTDM, WS 12/13 13 November 2012 T II.Intro- 11

Eccentricity, Radius & Diameter • The eccentricity of a vertex v i , e ( v i ), is its maximum distance to any other vertex, max j { d ( v i , v j )} • The radius of a connected graph, r ( G ), is the minimum eccentricity of any vertex, min i { e ( v i )} • The diameter of a connected graph, d ( G ), is the maximum eccentricity of any vertex, max i { e ( v i )} = max i,j { d ( v i , v j )} – The effective diameter of a graph is smallest number that is larger than the eccentricity of a large fraction of the vertices in the graph • “Large fraction” e.g. 90% DTDM, WS 12/13 13 November 2012 T II.Intro- 12

Clustering Coefficient • The clustering coefficient of vertex v i , C ( v i ), tells how clique-like the neighbourhood of v i is – Let n i be the number of neighbours of v i and m i the number of edges between the neighbours of v i ( v i excluded) ✓ n i ◆ 2 m i C ( v i ) = m i / = n i ( n i − 1 ) 2 – Well-defined only for v i with at least two neighbours • For others, let C ( v i ) = 0 • The clustering coefficient of the graph is the average clustering coefficient of the vertices: C ( G ) = n –1 Σ i C ( v i ) DTDM, WS 12/13 13 November 2012 T II.Intro- 13

Graph Mining • Graphs can explain relations between objects • Finding these relations is the task of graph mining – The type of the relation depends on the task • Graph mining is an umbrella term that encompasses many different techniques and problems – Frequent subgraph mining – Graph clustering – Path analysis/building – Influence propagation – … DTDM, WS 12/13 13 November 2012 T II.Intro- 14

Example: Tiling Databases A B C ( ) • Binary matrices define a 1 1 1 0 bipartite graph 1 1 1 2 • A tile is a biclique of that graph 0 1 1 3 • Tiling is the task of finding a minimum number of 1 A bicliques to cover all edges of a bipartite graph B 2 – Or to find k bicliques to cover most of the edges C 3 DTDM, WS 12/13 13 November 2012 T II.Intro- 15

Example: The Characteristics of Erd ő s Graph • Co-authorship graph of mathematicians • 401K authors (vertices), 676K co-authorships (edges) – Median degree = 1, mean = 3.36, standard deviation = 6.61 • Large connected component of 268K vertices – The radius of the component is 12 and diameter 23 – Two vertices with eccentricity 12 – Average distance between two vertices 7.64 (based on a sample) • “Eight degrees of separation” • The clustering coefficient is 0.14 http://www.oakland.edu/enp/ DTDM, WS 12/13 13 November 2012 T II.Intro- 16

Centrality • Six degrees of Kevin Bacon – ”Every actor is related to Kevin Bacon by no more than 6 hops” – Kevin Bacon has acted with many, that have acted with many others, that have acted with many others… • That makes Kevin Bacon a centre of the co-acting graph – Although he’s not the centre: the average distance to him is 2.994 but to Dennis Hopper it is only 2.802 http://oracleofbacon.org DTDM, WS 12/13 13 November 2012 T II.Intro- 17

Degree and Eccentricity Centrality • Centrality is a function c : V → ℝ that induces a total order in V – The higher the centrality of a vertex, the more important it is • In degree centrality c ( v i ) = d ( v i ), the degree of the vertex • In eccentricity centrality the least eccentric vertex is the most central one, c ( v i ) = 1/ e ( v i ) – The lest eccentric vertex is central – The most eccentric vertex is peripheral DTDM, WS 12/13 13 November 2012 T II.Intro- 18

Topic II: Graph Mining Discrete Topics in Data Mining Universitt - PowerPoint PPT Presentation

Topic II: Graph Mining Discrete Topics in Data Mining Universitt des Saarlandes, Saarbrcken Winter Semester 2012/13 T II.Intro- 1 Topic II Intro: Graph Mining 1. Why Graphs? 2. What is Graph Mining 3. Graphs: Definitions 4. Centrality

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Graph Essentials Graph Basics Social Media Mining Social Media Mining Measures and Metrics

Chapter X: Graph Mining Information Retrieval & Data Mining Universitt des Saarlandes,

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining: Concepts and Techniques Chapter 9 Graph mining and Social Network Analysis

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

COMP31212: Concurrency Topic 5.3: Liveness and Topic 5.4 Fairness Topic 5.3: Liveness Properties

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

CS 401: Computer Algorithm I Complexity / Graphs Xiaorui Sun 1 Complexity Given two positive

Graph Theory Mongi BLEL King Saud University August 30, 2019 Mongi BLEL Graph Theory Table of

Conjunctive Queries on Probabilistic Graphs: Combined Complexity Antoine Amarilli 1 , Mikal

Graphs "Graph theory is a terminological jungle, in which any newcomer may plant a

CS 574: Randomized Algorithms Lecture 6. Expander Graphs September 10, 2015 Lecture 6. Expander

Te Text Generation from Kn Knowledge Graphs with Gr Graph Transforme rmers NAACL19 Rik

What are we going to study in this class? Ling324 Meaning from Linguistic Expression: Semantics

Scientific Computing I Part II: A Continuous Model The Heat Equation Module 5: Heat Transfer

Topic II: Graph Mining Discrete Topics in Data Mining Universitt - PowerPoint PPT Presentation

Topic II: Graph Mining Discrete Topics in Data Mining Universitt des Saarlandes, Saarbrcken Winter Semester 2012/13 T II.Intro- 1 Topic II Intro: Graph Mining 1. Why Graphs? 2. What is Graph Mining 3. Graphs: Definitions 4. Centrality

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Graph Essentials Graph Basics Social Media Mining Social Media Mining Measures and Metrics

Chapter X: Graph Mining Information Retrieval &amp; Data Mining Universitt des Saarlandes,

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining: Concepts and Techniques Chapter 9 Graph mining and Social Network Analysis

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

COMP31212: Concurrency Topic 5.3: Liveness and Topic 5.4 Fairness Topic 5.3: Liveness Properties

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

CS 401: Computer Algorithm I Complexity / Graphs Xiaorui Sun 1 Complexity Given two positive

Graph Theory Mongi BLEL King Saud University August 30, 2019 Mongi BLEL Graph Theory Table of

Conjunctive Queries on Probabilistic Graphs: Combined Complexity Antoine Amarilli 1 , Mikal

Graphs &quot;Graph theory is a terminological jungle, in which any newcomer may plant a

CS 574: Randomized Algorithms Lecture 6. Expander Graphs September 10, 2015 Lecture 6. Expander

Te Text Generation from Kn Knowledge Graphs with Gr Graph Transforme rmers NAACL19 Rik

What are we going to study in this class? Ling324 Meaning from Linguistic Expression: Semantics

Scientific Computing I Part II: A Continuous Model The Heat Equation Module 5: Heat Transfer

Chapter X: Graph Mining Information Retrieval & Data Mining Universitt des Saarlandes,

Graphs "Graph theory is a terminological jungle, in which any newcomer may plant a