Structured Graph Learning Via Laplacian Spectral Constraints - PowerPoint PPT Presentation

Structured Graph Learning Via Laplacian Spectral Constraints Sandeep Kumar, Jiaxi Ying, José Vinícius de M. Cardoso, and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) NeurIPS 2019, Vancouver, Canada 11 December 2019

Outline 1 Graphical modeling 2 Probabilistic graphical model: GMRF 3 Structured graph learning (SGL): motivation, challenges and direction 4 Proposed framework for SGL via Laplacian spectral constraints 5 Algorithm: SGL via Laplacian spectral constraints 6 Experiments 1/48

Graphical models Representing knowledge through graphical models x 4 x 7 x 2 x 3 x 5 x 6 x 9 x 8 x 1 ◮ Nodes correspond to the entities (variables). ◮ Edges encode the relationships between entities (dependencies between the variables) 3/48

Why do we need graphical models? ◮ Graphs are intuitive way of representing and visualising the relationships between entities. ◮ Graphs allow us to abstract out the conditional independence relationships between the variables from the details of their parametric forms. Thus we can answer questions like: “Is x 1 dependent on x 6 given that we know the value of x 8 ?” just by looking at the graph. ◮ Graphs are widely used in a variety of applications in machine learning, graph CNN, graph signal processing, etc. ◮ Graphs offer a language through which different disciplines can seamlessly interact with each other. ◮ Graph-based approaches with big data and machine learning are driving the current research frontiers. Graphical Models = Statistics × Graph Theory × Optimization × Engineering 4/48

Why do we need graph learning? Graphical models are about having a graph representation that can encode relationships between entities. In many cases, the relationships between entities are straightforward : ◮ Are two people friends in a social network? ◮ Are two researchers co-authors in a published paper? In many other cases, relationships are not known and must be learned: ◮ Does one gene regulate the expression of others? ◮ Which drug alters the pharmacologic effect of another drug? The choice of graph representation affects the subsequent analysis and eventually the performance of any graph-based algorithm. The goal is to learn a graph representation of data with specific properties (e.g., structures). 5/48

Schematic of graph learning ◮ Given a data matrix X ∈ R n × p = [ x 1 , x 2 , . . . , x p ] , each column x i ∈ R n is assumed to reside on one of the p nodes and each of the n rows of X is a signal (or feature) on the same graph. ◮ The goal is to obtain a graph representation of the data. x 4 w 47 x 7 w 24 w 49 w 37 w 67 w 23 w 57 x 2 x 3 w 26 w 25 w 29 w 56 x 5 x 6 w 59 w 38 x 9 w 16 w 15 w 98 w 19 w 18 x 8 x 1 Graph is a simple mathematical structure of form G = ( V , E ) , where ◮ V contains the set of nodes V = { 1 , 2 , 3 , . . . , p } , and ◮ E = { (1 , 2) , (1 , 3) , . . . , ( i, j ) , . . . , ( p, p − 1) } contains the set of edges between any pair of nodes ( i, j ). ◮ Weights { w 12 , w 13 , . . . , w ij , . . . } encode the relationships strength. 6/48

Examples Learning relational dependencies among entities benefits numerous application domains. Figure 2: Social Graph Figure 1: Financial Graph Objective: To model behavioral similarity/ Objective : To infer inter-dependencies of influence between people. financial companies. Input : Input x i is the individual online Input x i is the economic indices (stock activities (tagging, liking, purchase). price, volume, etc.) of each entity. 7/48

Types of graphical models ◮ Models encoding direct dependencies: simple and intuitive. ◮ Sample correlation based graph. ◮ Similarity function (e.g., Gaussian RBF) based graph. ◮ Models based on some assumption on the data: X ∼ F ( G ) ◮ Statistical models : F represents a distribution by G (e.g., Markov model and Bayesian model). ◮ Physically-inspired models : F represents generative model on G (e.g., diffusion process on graphs). 8/48

Gaussian Markov random field (GMRF) A random vector x = ( x 1 , x 2 , . . . , x p ) ⊤ is called a GMRF with parameters ( 0 , Θ ), if its density follows: � − 1 � 1 p ( x ) = (2 π ) ( − p/ 2) ( det ( Θ )) 2 exp 2( x ⊤ Θx ) . The nonzero pattern of Θ determines a conditional graph G = ( V , E ) : Θ ij � = 0 ⇐ ⇒ { i, j } ∈ E ∀ i � = j x i ⊥ x j | x / ( x i , x j ) ⇐ ⇒ Θ ij = 0 ◮ For a Gaussian distributed data x ∼ N (0 , Σ = Θ † ) the graph learning is simply an inverse covariance (precision) matrix estimation problem [Lauritzen, 1996]. ◮ If the rank ( Θ ) < p then x is called an improper GMRF (IGMRF) [Rue and Held, 2005]. ◮ If Θ ij ≤ 0 ∀ i � = j then x is called an attractive improper GMRF [Slawski and Hein, 2015]. 10/48

Historical timeline of Markov graphical models Data X = { x ( i ) ∼ N (0 , Σ = Θ † ) } n � n S = 1 i =1 ( x ( i ) )( x ( i ) ) ⊤ i =1 , n ◮ Covariance selection [Dempster, 1972]: graph from the elements of S − 1 inverse sample covariance matrix. Not applicable when sample covariance is not invertible! ◮ Neighborhood regression [Meinshausen and Bühlmann, 2006]: β 1 | x (1) − β 1 X / x (1) | 2 + α � β 1 � 1 arg min ◮ ℓ 1 -regularized MLE [Friedman et al., 2008, Banerjee et al., 2008]: log det( Θ ) − tr � � − α � Θ � 1 . maximize ΘS Θ ≻ 0 ◮ Ising model: ℓ 1 -regularized logistic regression [Ravikumar et al., 2010]. ◮ Attractive IGMRF [Slawski and Hein, 2015]. ◮ Laplacian structure in Θ [Lake and Tenenbaum, 2010]. ◮ ℓ 1 -regularized MLE with Laplacian structure [Egilmez et al., 2017, Zhao et al., 2019] Limitation: Existing methods are not suitable for learning graphs with specific structures. 11/48

Structured graphs (i) Multi-component (ii) Regular graph (iii) Modular graph graph (iv) Bipartite graph (v) Grid graph (vi) Tree graph 13/48 Figure 3: Useful graph structures

Structured graphs: importance Useful structures: ◮ Multi-component : graph for clustering, classification. ◮ Bipartite : graph for matching and constructing two-channel filter banks. ◮ Multi-component bipartite : graph for co-clustering. ◮ Tree : graphs for sampling algorithms. ◮ Modular : graph for social network analysis. ◮ Connected sparse : graph for graph signal processing applications. 14/48

Structured graph learning: challenges Structured graph learning from data ◮ involves both the estimation of structure ( graph connectivity ) and parameters ( graph weights ), ◮ parameter estimation is well explored (e.g., maximum likelihood), ◮ but structure is a combinatorial property which makes structure estimation very challenging. Structure learning is NP-hard for a general class of graphical models [Bogdanov et al., 2008]. 15/48

Structured graph learning: direction State-of-the-art direction: ◮ The effort has been on characterizing the families of structures for which learning can be made feasible e.g., maximum weight spanning tree for tree structure [Chow and Liu, 1968] and local-separation and walk summability for Erdos-Renyi graphs, power-law graphs, and small-world graphs [Anandkumar et al., 2012]. ◮ Existing methods are restricted to some particular structures and it is difficult to extend them to learn other useful structures, e.g., multi-component, bipartite, etc. ◮ A recent method in [Hao et al., 2018], for learning multi-component structure follows a two-stage approach: non-optimal and not scalable to large-scale problems. Proposed direction: Graph ( structure ) ⇐ ⇒ Graph matrix ( spectrum ) ◮ Spectral properties of a graph matrix is one such characterization [Chung, 1997] which is considered in the present work. ◮ Under this framework, structure learning of a large class of graph structures can be expressed as an eigenvalue problem of the graph Laplacian matrix. 16/48

Problem statement To learn structured graphs via Laplacian spectral constraints 18/48

Structured Graph Learning Via Laplacian Spectral Constraints - PowerPoint PPT Presentation

Structured Graph Learning Via Laplacian Spectral Constraints Sandeep Kumar, Jiaxi Ying, Jos Vincius de M. Cardoso, and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) NeurIPS 2019, Vancouver, Canada 11 December

Scalable Laplacian K-modes Imtiaz Masud Ziko, Eric Granger and Ismail Ben Ayed Laplacian K-modes

A fundamental inequality for the p-Laplacian and the -Laplacian Yi Ru-Ya Zhang ETH Z urich

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

A Parallel Solver for Laplacian Matrices Tristan Konolige (me) and Jed Brown Graph Laplacian

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Spectral Graph Theory and its Applications Lillian Dai 6.454 Oct. 20, 2004 1 Outline Basic

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Spectral Convergence of Neumann Laplacian on Non-Compact Quasi-One-Dimensional Spaces and Some

Nonconvex Sparse Graph Learning under Laplacian-structured Graphical Model a talk by Jiaxi Ying,

Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid Paper by Sylvain

2 2 f f + = 0 2 2 x y Laplacian operator is discretized version

Dual Geometry of Laplacian Eigenfunctions and Graph Spatial-Spectral Analysis Alex Cloninger

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Course : Data mining Lecture : Spectral graph analysis Aristides Gionis Department of Computer

Graph Matching Networks for Learning the Similarity of Graph Structured Objects Yujia Li, Chenjie

Between Discrete and Continuous Optimization: Submodularity & Optimization Stefanie

Machine learning on the symmetric group Jean-Philippe Vert ML ML ML ML What if inputs are

Finding low-rank structure in messy data Laura Balzano University of Michigan Michigan Institute

Ranking and Calibrating Click-Attributed Purchases in Performance Display Advertising Sougata

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

Machine learning and convex optimization with submodular functions Francis Bach Sierra

To save and enhance lives October 5th, 2015 2 1

Shape Constrained Nonparametric Baseline Estimators in the Cox Model Joint work with Rik Lopuha

Structured Graph Learning Via Laplacian Spectral Constraints - PowerPoint PPT Presentation

Structured Graph Learning Via Laplacian Spectral Constraints Sandeep Kumar, Jiaxi Ying, Jos Vincius de M. Cardoso, and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) NeurIPS 2019, Vancouver, Canada 11 December

Scalable Laplacian K-modes Imtiaz Masud Ziko, Eric Granger and Ismail Ben Ayed Laplacian K-modes

A fundamental inequality for the p-Laplacian and the -Laplacian Yi Ru-Ya Zhang ETH Z urich

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

A Parallel Solver for Laplacian Matrices Tristan Konolige (me) and Jed Brown Graph Laplacian

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Spectral Graph Theory and its Applications Lillian Dai 6.454 Oct. 20, 2004 1 Outline Basic

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Spectral Convergence of Neumann Laplacian on Non-Compact Quasi-One-Dimensional Spaces and Some

Nonconvex Sparse Graph Learning under Laplacian-structured Graphical Model a talk by Jiaxi Ying,

Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid Paper by Sylvain

2 2 f f + = 0 2 2 x y Laplacian operator is discretized version

Dual Geometry of Laplacian Eigenfunctions and Graph Spatial-Spectral Analysis Alex Cloninger

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Course : Data mining Lecture : Spectral graph analysis Aristides Gionis Department of Computer

Graph Matching Networks for Learning the Similarity of Graph Structured Objects Yujia Li, Chenjie

Between Discrete and Continuous Optimization: Submodularity &amp; Optimization Stefanie

Machine learning on the symmetric group Jean-Philippe Vert ML ML ML ML What if inputs are

Finding low-rank structure in messy data Laura Balzano University of Michigan Michigan Institute

Ranking and Calibrating Click-Attributed Purchases in Performance Display Advertising Sougata

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

Machine learning and convex optimization with submodular functions Francis Bach Sierra

To save and enhance lives October 5th, 2015 2 1

Shape Constrained Nonparametric Baseline Estimators in the Cox Model Joint work with Rik Lopuha

Between Discrete and Continuous Optimization: Submodularity & Optimization Stefanie