Graph Classification Classification Outline Introduction, Overview - PowerPoint PPT Presentation

Graph Classification

Classification Outline • Introduction, Overview • Classification using Graphs – Graph classification – Direct Product Kernel • Predictive Toxicology example dataset – Vertex classification – Laplacian Kernel • WEBKB example dataset • Related Works

Example: Molecular Structures Unknown Known A Toxic Non-toxic B E D B A C C A E B D B D C C A E Task : predict whether molecules are D F toxic, given set of known examples

Solution: Machine Learning • Computationally discover and/or predict properties of interest of a set of data • Two Flavors: – Unsupervised : discover discriminating properties among groups of data (Example: Clustering) Data Clusters Property – Discovery, Partitioning – Supervised : known properties, categorize data with unknown properties (Example: Classification) Predict Test Build Classification Training Data Model Data

Classification • Classification : The task of assigning class labels in a discrete class label set Y to input instances in an input space X • Ex: Y = { toxic, non-toxic }, X = { valid molecular structures } Misclassified data instance (test error) Unclassified data instances Assignment of the unknown (test) data to Training the classification model appropriate class labels using the model using the training data

Classification Outline • Introduction, Overview • Classification using Graphs, – Graph classification – Direct Product Kernel • Predictive Toxicology example dataset – Vertex classification – Laplacian Kernel • WEBKB example dataset • Related Works

Classification with Graph Structures • Graph classification • Vertex classification (between-graph) (within-graph) – Each full graph is – Within a single graph, assigned a class label each vertex is assigned a class label • Example: Molecular graphs • Example: Webpage (vertex) / hyperlink A (edge) graphs NCSU domain B E D Faculty C Toxic Course Student

Relating Graph Structures to Classes? • Frequent Subgraph Mining (Chapter 7) – Associate frequently occurring subgraphs with classes • Anomaly Detection (Chapter 11) – Associate anomalous graph features with classes • *Kernel-based methods (Chapter 4) – Devise kernel function capturing graph similarity, use vector- based classification via the kernel trick

Relating Graph Structures to Classes? • This chapter focuses on kernel-based classification. • Two step process: – Devise kernel that captures property of interest – Apply kernelized classification algorithm, using the kernel function. • Two type of graph classification looked at – Classification of Graphs • Direct Product Kernel – Classification of Vertices • Laplacian Kernel • See Supplemental slides for support vector machines (SVM), one of the more well-known kernelized classification techniques.

Walk-based similarity (Kernels Chapter) • Intuition – two graphs are similar if they exhibit similar patterns when performing random walks Random walk vertices heavily H I J distributed towards A,B,D,E Random walk vertices Similar! heavily distributed towards H,I,K with slight A B C K L bias towards L Q R S D E F Random walk vertices Not Similar! evenly distributed T U V

Classification Outline • Introduction, Overview • Classification using Graphs – Graph classification – Direct Product Kernel • Predictive Toxicology example dataset. – Vertex classification – Laplacian Kernel • WEBKB example dataset. • Related Works

Direct Product Graph – Formal Definition Input Graphs Direct Product � � = � � , � � Vertices � � = ( � � , � � ) � � � = { � , � ∈ � � × � � } Direct Product Notation � � = � � × � � Direct Product Intuition Edges Vertex set : each vertex of � � paired with every vertex of � � � � = { � , � , � , � | � � , � ∈ � � �� , � ∈ � � } Edge set: Edges exist only if both pairs of vertices in the respective graphs contain an edge

Direct Product Graph - example B A C A E B D D C Type-A Type-B

Direct Product Graph Example Type-A A B C D Type-B A B C D E A B C D E A B C D E A B C D E A 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 B 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 A C 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 A 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 B 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 B C 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 D 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 E 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 A 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 B 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 C C 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 D 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 E 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 A Intuition : multiply each entry of 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 B Type-A by entire matrix of Type-B 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 D C 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0

Direct Product Kernel (see Kernel Chapter) 1. Compute direct product graph � � 2. Compute the maximum in- and out-degrees of Gx , di and do . 3. Compute the decay constant γ < 1 / min( di, do ) 4. Compute the infinite weighted geometric series of walks (array A ). 5. Sum over all vertex pairs. Direct Product Graph of Type-A and Type-B

Kernel Matrix � � � , � � , � � � , � � , … , � � � , � � � � � , � � , � � � , � � , … , � � � , � � . . . � � � , � � , � � � , � � , … , � ( � � , � � ) • Compute direct product kernel for all pairs of graphs in the set of known examples. • This matrix is used as input to SVM function to create the classification model. • *** Or any other kernelized data mining method!!!

Classification Outline • Introduction, Overview • Classification using Graphs, – Graph classification – Direct Product Kernel • Predictive Toxicology example dataset. – Vertex classification – Laplacian Kernel • WEBKB example dataset. • Related Works

Predictive Toxicology (PTC) dataset • The PTC dataset is a A collection of molecules that have been tested positive or negative for toxicity. B D 1. # R code to create the SVM model 2. data(“PTCData”) # graph data C 3. data(“PTCLabels”) # toxicity information 4. # select 5 molecules to build model on 5. sTrain = sample(1:length(PTCData),5) B 6. PTCDataSmall <- PTCData[sTrain] 7. PTCLabelsSmall <- PTCLabels[sTrain] 8. # generate kernel matrix C A E 9. K = generateKernelMatrix (PTCDataSmall, PTCDataSmall) 10. # create SVM model D 11. model =ksvm(K, PTCLabelsSmall, kernel=‘matrix’)

Kernels for Vertex Classification � • von Neumann kernel � = � � �� • (Chapter 6) �� • Regularized Laplacian � = � � � − � � • (This chapter) ��

Example: Hypergraphs • A hypergraph is a • Example: word-webpage generalization of a graph graph, where an edge • Vertex – webpage can connect any number • Edge – set of pages of vertices • I.e., each edge is a containing same word subset of the vertex set. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

“Flattening” a Hypergraph • Given hypergraph matrix � , � � � � represents “similarity matrix” • Rows, columns represent vertices • ��, �� entry – number of hyperedges incident on both vertex � and � . • Problem: some neighborhood info. lost (vertex 1 and 3 just as “similar” as 1 and 2)

Laplacian Matrix • In the mathematical field of graph theory the Laplacian matrix (L), is a matrix representation of a graph. • L = D – M • M – adjacency matrix of graph (e.g., A*A T from hypergraph flattening) • D – degree matrix (diagonal matrix where each (i,i) entry is vertex i‘s [weighted] degree) • Laplacian used in many contexts (e.g., spectral graph theory)

Normalized Laplacian Matrix • Normalizing the matrix helps eliminate bias in matrix toward high-degree vertices if � � � and deg � � � 0 1 �1 if � � � and � � is adjacent to � � � �,� ≔ deg � � deg �� otherwise 0 Original L Regularized L

Laplacian Kernel � • Uses walk-based � � � � � �� geometric series, only applied to regularized �� Laplacian matrix • Decay constant NOT � � � � �� degree-based – instead tunable parameter < 1 Regularized L

Graph Classification Classification Outline Introduction, Overview - PowerPoint PPT Presentation

Graph Classification Classification Outline Introduction, Overview Classification using Graphs Graph classification Direct Product Kernel Predictive Toxicology example dataset Vertex classification Laplacian Kernel

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Classification: A Comparison Study 02/04/19 Presented by: Camilo Muoz Juan Carrillo

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

Graph coloring Simone Campanoni simonec@eecs.northwestern.edu Outline Graph coloring

Split clique graph complexity L. Alcn and M. Gutierrez La Plata, Argentina L. Faria and C. M.

XL1C: Graph Times-Series Using Ratio Display 3/9/2017 V0D XL1C: V0D XL1C: V0D Graph by Time

XL1A: Graph Nominal Frequency Data Using Excel2013 3/10/2017 V0E XL1A: V0E XL1A: V0E Graph

Graph Sparsifiers Smaller graph that (approximately) preserves the values of some set of

Network/Graph Network/Graph Informally a graph is a set of nodes Theory Theory joined by a

Integration Testing Path Based Chapter 13 Call graph based integration Use the call graph

8.3 GRAPH REPRESENTATIONS AND GRAPH ISOMORPHISM INCIDENCE TABLE REPRESENTATION def: An incidence

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Graph Traversal Graph Traversal with DFS/BFS One of the most fundamental graph problems is to

Community Approaches to Toxic Stress National Child Abuse and Neglect Technical Assistance and

A Toxic Recipe: How White Supremacy is Baked into U.S. Institutions Sanaa Abrar, Heron

Toxic Contaminants Working Group Part of the Piscataqua Region Estuaries Partnership (PREP)

SEEDS: THE SOFTWARE ENGINEER'S ENERGY- OPTIMIZATION DECISION SUPPORT FRAMEWORK James Clause

Toxicity Bioassays: A Useful Line of Evidence in Ecological Risk Assessment* Ned Black, Ph.D.

CEPA: LESSONS FOR CHEMICAL REGULATION Joseph F. Castrilli, Counsel Canadian Environmental Law

Introductory Course for Commercial Dog Breeders Part 10: Husbandry Standards Course Objectives

FINDING TOXIC CODE Experiences and techniques for finding dangerous code in large multi-language

Graph Classification Classification Outline Introduction, Overview - PowerPoint PPT Presentation

Graph Classification Classification Outline Introduction, Overview Classification using Graphs Graph classification Direct Product Kernel Predictive Toxicology example dataset Vertex classification Laplacian Kernel

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Classification: A Comparison Study 02/04/19 Presented by: Camilo Muoz Juan Carrillo

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

Graph coloring Simone Campanoni simonec@eecs.northwestern.edu Outline Graph coloring

Split clique graph complexity L. Alcn and M. Gutierrez La Plata, Argentina L. Faria and C. M.

XL1C: Graph Times-Series Using Ratio Display 3/9/2017 V0D XL1C: V0D XL1C: V0D Graph by Time

XL1A: Graph Nominal Frequency Data Using Excel2013 3/10/2017 V0E XL1A: V0E XL1A: V0E Graph

Graph Sparsifiers Smaller graph that (approximately) preserves the values of some set of

Network/Graph Network/Graph Informally a graph is a set of nodes Theory Theory joined by a

Integration Testing Path Based Chapter 13 Call graph based integration Use the call graph

8.3 GRAPH REPRESENTATIONS AND GRAPH ISOMORPHISM INCIDENCE TABLE REPRESENTATION def: An incidence

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Graph Traversal Graph Traversal with DFS/BFS One of the most fundamental graph problems is to

Community Approaches to Toxic Stress National Child Abuse and Neglect Technical Assistance and

A Toxic Recipe: How White Supremacy is Baked into U.S. Institutions Sanaa Abrar, Heron

Toxic Contaminants Working Group Part of the Piscataqua Region Estuaries Partnership (PREP)

SEEDS: THE SOFTWARE ENGINEER'S ENERGY- OPTIMIZATION DECISION SUPPORT FRAMEWORK James Clause

Toxicity Bioassays: A Useful Line of Evidence in Ecological Risk Assessment* Ned Black, Ph.D.

CEPA: LESSONS FOR CHEMICAL REGULATION Joseph F. Castrilli, Counsel Canadian Environmental Law

Introductory Course for Commercial Dog Breeders Part 10: Husbandry Standards Course Objectives

FINDING TOXIC CODE Experiences and techniques for finding dangerous code in large multi-language

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,