Data Mining: Concepts and Techniques Chapter 9 Graph mining and - PowerPoint PPT Presentation

Data Mining: Concepts and Techniques — Chapter 9 — Graph mining and Social Network Analysis Li Xiong Slides credits: Jiawei Han and Micheline Kamber 1 April 2, 2008

Graph Mining and Social Network Analysis � Graph mining � Frequent subgraph mining � Social network analysis � Social network � Social network analysis at different levels � Link analysis 2 April 2, 2008 Mining and Searching Graphs in Graph Databases

Graph Mining � Methods for Mining Frequent Subgraphs � Applications: � Graph Indexing � Similarity Search � Classification and Clustering � Summary 3 April 2, 2008 Mining and Searching Graphs in Graph Databases

Why Graph Mining? � Graphs are ubiquitous � Chemical compounds (Cheminformatics) � Protein structures, biological pathways/networks (Bioinformactics) � Program control flow, traffic flow, and workflow analysis � XML databases, Web, and social network analysis � Graph is a general model � Trees, lattices, sequences, and items are degenerated graphs � Diversity of graphs � Directed vs. undirected, labeled vs. unlabeled (edges & vertices), weighted, with angles & geometry (topological vs. 2-D/3-D) � Complexity of algorithms: many problems are of high complexity 4 April 2, 2008 Mining and Searching Graphs in Graph Databases

Graph, Graph, Everywhere from H. Jeong et al Nature 411, 41 (2001) Aspirin Yeast protein interaction network Co-author network I nternet 5 April 2, 2008 Mining and Searching Graphs in Graph Databases

Graph Pattern Mining � Frequent subgraph mining � Finding frequent subgraphs within a single graph � Finding frequent (sub)graphs in a set of graphs � support (occurrence frequency) no less than a minimum support threshold � Applications of graph pattern mining � Mining biochemical structures, program control flow analysis, XML structures or Web communities � Building blocks for graph classification, clustering, compression, comparison, and correlation analysis 6 April 2, 2008 Mining and Searching Graphs in Graph Databases

Example: Frequent Subgraph Mining in Chemical Compounds GRAPH DATASET O O OH S N N N O O HO N O O (A) (B) (C) FREQUENT PATTERNS (MIN SUPPORT IS 2) O (1) (2) N N N O 7 April 2, 2008 Mining and Searching Graphs in Graph Databases

Graph Mining Algorithms � Finding interesting and frequent substructures in a single graph � SUBDUE � Finding frequent patterns in a set of independent graphs � Apriori-based approach � Pattern-growth approach 8 April 2, 2008 Mining and Searching Graphs in Graph Databases

SUBDUE (Holder et al. KDD’94) � Problem � Finding “interesting” and repetitive substructures (connected subgraphs) in data represented as a graph � Basic idea � Minimum description length (MDL) principle � Beam search algorithm � Start with best single vertices � Expand best substructures with a new edge � Substructures are evaluated based on their ability to compress input graphs 9 April 2, 2008 Li Xiong

Minimum Description Length (MDL) � Minimum description length (MDL) principle � A formalization of Occam’s Razor � Best hypothesis minimizes description length of the data (largest compression) � Graph substructure discovery based on MDL � Description length (DL): represent vertices and adjacency matrix � Graph compression: replace substructure instances with pointers � Find best substructure S in G that minimizes: DL(S) + DL(G|S) Input Database (G) Substructure (S1) Compressed Database (G|S1) T1 C1 S1 C1 S1 S1 S1 Triangle R1 R1 Square S1 S1 S1 S1 S1 S1 S1 S1 S1 T2 T3 T4 S2 S3 S4 Holder et al.

Beam Search Algorithm � Beam search � An optimization of best-first search � Breadth-first search with a predetermined number of paths kept as candidates (beam width) � Subgraph discovery based on beam search � Start with best single vertices � Expand best substructures with a new edge � Substructures are evaluated based on their ability to compress input graphs (minimize description length) 11 April 2, 2008 Li Xiong

Algorithm Create substructure for each unique vertex label 1. Input Database (G) Input Database (G) Substructures (S) (Graph form) triangle on triangle (4), square (4), circle square T1 circle (1), rectangle (1) on on C1 S1 rectangle R1 on on on T2 T3 T4 triangle triangle triangle on on on S2 S3 S4 square square square 12 Holder et al.

Algorithm (cont.) Expand best substructures by an edge or edge 2. + neighboring vertex Substructures (S) triangle on circle square triangle circle on on on on square rectangle rectangle on on on rectangle square triangle triangle triangle on on on on on triangle rectangle square square square 13 Holder et al.

Algorithm (cont.) Keep best beam-width substructures on queue 3. Terminate when queue is empty or #discovered 4. substructures >= limit Compress graph with hierarchical description 5. 14 Holder et al. SRL Workshop

Frequent Subgraph Mining Approaches � Problem: finding frequent subgraphs in a set of graphs � Apriori-based approach � AGM: Inokuchi, et al. (PKDD’00) � FSG: Kuramochi and Karypis (ICDM’01) � PATH # : Vanetik and Gudes (ICDM’02, ICDM’04) � FFSM: Huan, et al. (ICDM’03) � Pattern growth approach � MoFa, Borgelt and Berthold (ICDM’02) � gSpan: Yan and Han (ICDM’02) � Gaston: Nijssen and Kok (KDD’04) � Close pattern mining � CLOSEGRAPH: Yan & Han (KDD’03) 15 April 2, 2008 Mining and Searching Graphs in Graph Databases

Apriori-Based Approach � Level-wise algorithm: building candidate subgraphs from small frequent subgraphs Subgraphs w ith Frequent extra vertex, edge subgraphs G 1 G G 2 G’ … G’’ G n JOI N 16 April 2, 2008

Apriori-Based Search � AGM (Apriori-based Graph Mining), Inokuchi, et al. PKDD’00 � generates new graphs with one more node � FSG (Frquent SubGraph mining), Kuramochi and Karypis, ICDM’01 � generates new graphs with one more edge b c a a a a a a a a 17 April 2, 2008 Mining and Searching Graphs in Graph Databases

Pattern Growth Method ( k+ 2 ) -edge ( k+ 1 ) -edge … G 1 k-edge duplicate G 2 graph G … G n … 18 April 2, 2008 Mining and Searching Graphs in Graph Databases

GSPAN (Yan and Han ICDM’02) � Depth-based search and right-most extension 19 April 2, 2008 Mining and Searching Graphs in Graph Databases

Graph Mining � Methods for Mining Frequent Subgraphs � Applications: � Classification and Clustering � Graph Indexing � Similarity Search 20 April 2, 2008 Mining and Searching Graphs in Graph Databases

Using Graph Patterns � Similarity measures based on graph patterns � Feature-based similarity measure � Each graph is represented as a feature vector � Frequent subgraphs can be used as features � Vector distance � Structure-based similarity measure � Maximal common subgraph � Graph edit distance: insertion, deletion, and relabel � Frequent and discriminative subgraphs are high-quality indexing features 21 April 2, 2008 Mining and Searching Graphs in Graph Databases

Social Network Analysis � Social network � Different levels of social network analysis � Common measures and methods for social network analysis � Link analysis 22 April 2, 2008 Mining and Searching Graphs in Graph Databases

Social Network � Social network: a social structure consists of nodes and ties. � Nodes are the individual actors within the networks � May be different kinds � May have attributes, labels or classes � Ties are the relationships between the actors � May be different kinds � Links may have attributes, directed or undirected � Homogeneous networks � Single object type and single link type � Single model social networks (e.g., friends) � WWW: a collection of linked Web pages Heterogeneous networks � � Multiple object and link types � Medical network: patients, doctors, disease, contacts, treatments � Bibliographic network: publications, authors, venues 23 April 2, 2008 Mining and Searching Graphs in Graph Databases

Small World Phenomenon � Number of degrees of separation in actual social networks? � Six-degree separation: everyone is an average of six "steps" away from each person on Earth. � Empirical studies � Michael Gurevich,1961. US population linked by 2 intermediaries � Duncan Watts, 2001. Email-delivery on the internet: average number of intermediaries is 6. � Leskovec and Horvitz, 2007. Instant messages: average path length is 6.6 24 April 2, 2008 Mining and Searching Graphs in Graph Databases

Data Mining: Concepts and Techniques Chapter 9 Graph mining and - PowerPoint PPT Presentation

Data Mining: Concepts and Techniques Chapter 9 Graph mining and Social Network Analysis Li Xiong Slides credits: Jiawei Han and Micheline Kamber 1 April 2, 2008 Graph Mining and Social Network Analysis Graph mining Frequent

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

The Longitudinal Aging Study Amsterdam and the challenge of informing policy and practice

Embeddings for KB and text representation, extraction and question answering. Jason Weston

Increased flow & pressure are the essential triggers Loss of reversibility in flow-induced

1 Presentation Overview EOCCO overview structure EOCCO Members Community Investments Community

Biodiversity Disturbance Succession Works Cited Return to Table of Contents Slide

Extracting semantic relations from unlabeled text Chandra Prakash Vishal Kumar Gupta Mentor: Dr.

Introduction: Where do NLP and DM Meet? 1 7/26/2015 Slightly Different Research Philosophies

Sec 4 Sec 4 & 5 & 5 Par arent ent Enga Engagement Session gement Session 18 Jan

Data Mining: Concepts and Techniques Chapter 9 Graph mining and - PowerPoint PPT Presentation

Data Mining: Concepts and Techniques Chapter 9 Graph mining and Social Network Analysis Li Xiong Slides credits: Jiawei Han and Micheline Kamber 1 April 2, 2008 Graph Mining and Social Network Analysis Graph mining Frequent

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

The Longitudinal Aging Study Amsterdam and the challenge of informing policy and practice

Embeddings for KB and text representation, extraction and question answering. Jason Weston

Increased flow &amp; pressure are the essential triggers Loss of reversibility in flow-induced

1 Presentation Overview EOCCO overview structure EOCCO Members Community Investments Community

Biodiversity Disturbance Succession Works Cited Return to Table of Contents Slide

Extracting semantic relations from unlabeled text Chandra Prakash Vishal Kumar Gupta Mentor: Dr.

Introduction: Where do NLP and DM Meet? 1 7/26/2015 Slightly Different Research Philosophies

Sec 4 Sec 4 &amp; 5 &amp; 5 Par arent ent Enga Engagement Session gement Session 18 Jan

Increased flow & pressure are the essential triggers Loss of reversibility in flow-induced

Sec 4 Sec 4 & 5 & 5 Par arent ent Enga Engagement Session gement Session 18 Jan