Learning graphs from data: A signal processing perspective Xiaowen - PowerPoint PPT Presentation

A (partial) historical overview ` 1 -regularized quadratic approx. covariance neighborhood -regularized of Gauss. neg. ` 1 selection regression log-determinant log-likelihood Meinshausen Banerjee Dempster Hsieh & Buhlmann Friedman 1972 2006 2008 2011 Estimation of sparse precision matrix graphical LASSO maximizes likelihood of precision matrix : Θ max log det Θ − tr( S Θ ) − ρ || Θ || 1 Θ log-likelihood function X S 9 /34

A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 Neighborhood learning for discrete variables v 3 v 4 β 13 β 14 v 2 β 15 v 5 β 12 v 1 10 /34

A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 Neighborhood learning for discrete variables m X 1 m v 3 v 4 β 13 β 14 X \ 1 m v 2 β 15 v 5 β 12 v 1 10 /34

A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 Neighborhood learning for discrete variables m X 1 m v 3 v 4 β 13 β 14 X \ 1 m v 2 β 15 v 5 β 12 regularized logistic regression: v 1 max log P β ( X 1 m | X \ 1 m ) − λ || β 1 || 1 β 1 logistic function 10 /34

A (partial) historical overview • Simple and intuitive methods - Sample correlation - Similarity function (e.g., Gaussian RBF) • Learning graphical models - Classical learning approaches lead to both positive/negative relations - What about learning a graph topology with non-negative weights? 11 /34

A (partial) historical overview • Simple and intuitive methods - Sample correlation - Similarity function (e.g., Gaussian RBF) • Learning graphical models - Classical learning approaches lead to both positive/negative relations - What about learning a graph topology with non-negative weights? • Learning topologies with non-negative weights - M-matrices (sym., p.d., non-pos. o ff -diag.) have been used as precision, leading to attractive GMRF (Slawski and Hein 2015) 11 /34

A (partial) historical overview • Simple and intuitive methods - Sample correlation - Similarity function (e.g., Gaussian RBF) • Learning graphical models - Classical learning approaches lead to both positive/negative relations - What about learning a graph topology with non-negative weights? • Learning topologies with non-negative weights - M-matrices (sym., p.d., non-pos. o ff -diag.) have been used as precision, leading to attractive GMRF (Slawski and Hein 2015) The combinatorial graph Laplacian L = Deg - W belongs to M-matrices and is - equivalent to graph topology 11 /34

A (partial) historical overview • Simple and intuitive methods - Sample correlation - Similarity function (e.g., Gaussian RBF) • Learning graphical models - Classical learning approaches lead to both positive/negative relations - What about learning a graph topology with non-negative weights? • Learning topologies with non-negative weights - M-matrices (sym., p.d., non-pos. o ff -diag.) have been used as precision, leading to attractive GMRF (Slawski and Hein 2015) The combinatorial graph Laplacian L = Deg - W belongs to M-matrices and is - equivalent to graph topology From arbitrary precision matrix to graph Laplacian! 11 /34

A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 max log det Θ − tr( S Θ ) − ρ || Θ || 1 Θ graph Laplacian L can be the precision, BUT it is singular 12 /34

A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 max log det Θ − tr( S Θ ) − ρ || Θ || 1 Θ s.t. Θ = L + 1 σ 2 I Lake ` 1 -regularized graph Laplacian L can be the precision, log-determinant BUT it is singular on generalized L 12 /34

A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 max log det Θ − tr( S Θ ) − ρ || Θ || 1 Θ s.t. Θ = L + 1 σ 2 I Lake ` 1 -regularized graph Laplacian L can be the precision, log-determinant BUT it is singular on generalized L precision by Laplacian by graphical LASSO Lake et al. 12 /34

A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 max log det Θ − tr( S Θ ) − ρ || Θ || 1 Θ s.t. Θ = L + 1 σ 2 I Lake ` 1 -regularized graph Laplacian L can be the precision, log-determinant BUT it is singular on generalized L precision by Laplacian by graphical LASSO Lake et al. Slawski and Hein (2015) 12 /34

A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2009 2010 2011 2013 Daitch Lake Hu quadratic form ` 1 -regularized quadratic form tr( X T L s X ) − β || W || F F = X T L 2 X || LX || 2 log-determinant of power of L of power of L on generalized L locally linear embedding [Roweis00] Slawski and Hein (2015) 13 /34

A (partial) historical overview x : V → R N ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 v 1 selection regression log-determinant regression log-likelihood v 4 v 3 v 2 Meinshausen Banerjee v 5 Dempster Ravikumar Hsieh v 6 & Buhlmann Friedman v 7 v 8 v 9 1972 2006 2008 2009 2010 2011 2013 GSP Daitch Lake Hu quadratic form ` 1 -regularized quadratic form log-determinant of power of L of power of L on generalized L Slawski and Hein (2015) 14 /34

A (partial) historical overview x : V → R N ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 v 1 selection regression log-determinant regression log-likelihood v 4 v 3 v 2 Meinshausen Banerjee v 5 Dempster Ravikumar Hsieh v 6 & Buhlmann Friedman v 7 v 8 v 9 1972 2006 2008 2009 2010 2011 2013 2015 2016 GSP Daitch Lake Hu Dong Egilmez Thanou Mei Kalofolias Baingana quadratic form ` 1 -regularized quadratic form log-determinant of power of L of power of L Pasdeloup Segarra Chepuri on generalized L signal processing Slawski and Hein perspective (2015) 14 /34

A signal processing perspective • Existing approaches have limitations - Simple correlation or similarity functions are not enough - Most classical methods for learning graphical models do not directly lead to topologies with non-negative weights - There is no strong emphasis on signal/graph interaction with spectral/frequency- domain interpretation 15 /34

A signal processing perspective • Existing approaches have limitations - Simple correlation or similarity functions are not enough - Most classical methods for learning graphical models do not directly lead to topologies with non-negative weights - There is no strong emphasis on signal/graph interaction with spectral/frequency- domain interpretation • Opportunity and challenge for graph signal processing - GSP tools such as frequency-analysis and filtering can contribute to the graph learning problem - Filtering-based approaches can provide generative models for signals with complex non-Gaussian behavior 15 /34

A signal processing perspective • Signal processing is about D c = x × = D c x 16 /34

A signal processing perspective • Graph signal processing is about D ( G ) c = x v 3 v 4 × = v 2 v 5 v 1 D ( G ) G c x 16 /34

A signal processing perspective • Forward: Given G and x , design D to study c v 3 v 4 × = v 2 v 5 v 1 D ( G ) G c x Fourier/wavelet graph Fourier/ [Coifman06,Narang09,Hammond11, atoms wavelet coe ffi cient Shuman13,Sandryhaila13] trained dictionary graph dictionary [Zhang12,Thanou14] atoms coe ffi cient 16 /34

A signal processing perspective • Backward (graph learning): Given x , design D and c to infer G v 3 v 4 × = v 2 v 5 v 1 D ( G ) G c x 16 /34

A signal processing perspective • Backward (graph learning): Given x , design D and c to infer G v 3 v 4 × = v 2 v 5 v 1 D ( G ) G c x The key is a signal/graph model behind D - - Designed around graph operators (adjacency/Laplacian matrices, shift operators) 16 /34

A signal processing perspective • Backward (graph learning): Given x , design D and c to infer G v 3 v 4 × = v 2 v 5 v 1 D ( G ) G c x The key is a signal/graph model behind D - - Designed around graph operators (adjacency/Laplacian matrices, shift operators) Choice of/assumption on c often determines signal characteristics - 16 /34

Model 1: Global smoothness • Signal values vary smoothly between all pairs of nodes that are connected • Example: Temperature of di ff erent locations in a flat geographical region • Usually quantified by the Laplacian quadratic form: x T Lx = 1 W ij ( x ( i ) − x ( j )) 2 X 2 i,j 17 /34

Model 1: Global smoothness • Signal values vary smoothly between all pairs of nodes that are connected • Example: Temperature of di ff erent locations in a flat geographical region • Usually quantified by the Laplacian quadratic form: x : V → R N x T Lx = 1 W ij ( x ( i ) − x ( j )) 2 X 2 v 1 i,j v 3 v 4 v 2 v 5 v 6 v 7 v 8 v 9 x T Lx = 1 v 1 v 4 v 3 v 2 v 5 v 6 v 7 v 8 v 9 x T Lx = 21 17 /34

Model 1: Global smoothness • Signal values vary smoothly between all pairs of nodes that are connected • Example: Temperature of di ff erent locations in a flat geographical region • Usually quantified by the Laplacian quadratic form: x : V → R N x T Lx = 1 W ij ( x ( i ) − x ( j )) 2 X 2 v 1 i,j v 3 v 4 v 2 v 5 v 6 v 7 v 8 Similar to previous approaches: v 9 x T Lx = 1 σ 2 I log det Θ − 1 M tr( XX T Θ ) − ρ || Θ || 1 Lake (2010): max Θ = L + 1 v 1 X T L 2 X Daitch (2009): min v 4 v 3 v 2 v 5 L v 6 v 7 v 8 tr( X T L s X ) − β || W || F Hu (2013): min v 9 L x T Lx = 21 17 /34

Model 1: Global smoothness • Dong et al. (2015) & Kalofolias (2016) v 3 v 4 × = v 2 v 5 v 1 (eigenvector matrix of L ) - D ( G ) = χ G χ c x Gaussian assumption on c : c ∼ N ( 0 , Λ ) - 18 /34

Model 1: Global smoothness • Dong et al. (2015) & Kalofolias (2016) v 3 v 4 × = v 2 v 5 v 1 (eigenvector matrix of L ) - D ( G ) = χ G χ c x Gaussian assumption on c : c ∼ N ( 0 , Λ ) - Maximum a posterior (MAP) estimation on c leads to minimization of Laplacian - quadratic form: || x − χ c || 2 min 2 − log P c ( c ) c F + α tr( Y T LY ) + β || L || 2 L , Y || X − Y || 2 min F regularization data fidelity smoothness on Y 18 /34

Model 1: Global smoothness • Dong et al. (2015) & Kalofolias (2016) v 3 v 4 × = v 2 v 5 v 1 (eigenvector matrix of L ) - D ( G ) = χ G χ c x Gaussian assumption on c : c ∼ N ( 0 , Λ ) - Maximum a posterior (MAP) estimation on c leads to minimization of Laplacian - quadratic form: || x − χ c || 2 min 2 − log P c ( c ) v 1 c v 3 v 4 v 2 v 5 v 6 v 7 v 8 F + α tr( Y T LY ) + β || L || 2 L , Y || X − Y || 2 min v 9 F regularization data fidelity smoothness on Y x 18 /34

Model 1: Global smoothness • Dong et al. (2015) & Kalofolias (2016) v 3 v 4 × = v 2 v 5 v 1 (eigenvector matrix of L ) - D ( G ) = χ G χ c x Gaussian assumption on c : c ∼ N ( 0 , Λ ) - Maximum a posterior (MAP) estimation on c leads to minimization of Laplacian - quadratic form: || x − χ c || 2 min 2 − log P c ( c ) v 1 c v 3 v 4 v 2 v 5 v 6 v 7 v 8 F + α tr( Y T LY ) + β || L || 2 L , Y || X − Y || 2 min v 9 F regularization data fidelity smoothness on Y y 18 /34

Model 1: Global smoothness • Dong et al. (2015) & Kalofolias (2016) v 3 v 4 × = v 2 v 5 v 1 (eigenvector matrix of L ) - D ( G ) = χ G χ c x Gaussian assumption on c : c ∼ N ( 0 , Λ ) - Maximum a posterior (MAP) estimation on c leads to minimization of Laplacian - quadratic form: || x − χ c || 2 min 2 − log P c ( c ) v 1 c v 3 v 4 v 2 v 5 v 6 v 7 v 8 F + α tr( Y T LY ) + β || L || 2 L , Y || X − Y || 2 min v 9 F regularization data fidelity smoothness on Y y Learning enforces signal property (global smoothness)! 18 /34

Model 1: Global smoothness • Egilmez et al. (2016) v 3 v 4 × = v 2 v 5 v 1 K = S − α 2 ( 11 T − I ) min tr( Θ K ) − log det Θ s.t. G χ c x Θ - Solve for as three di ff erent graph Laplacian matrices: Θ 19 /34

Model 1: Global smoothness • Egilmez et al. (2016) v 3 v 4 × = v 2 v 5 v 1 K = S − α 2 ( 11 T − I ) min tr( Θ K ) − log det Θ s.t. G χ c x Θ - Solve for as three di ff erent graph Laplacian matrices: Θ non-negative negative generalized Laplacian Θ = L + V = Deg − W + V 19 /34

Model 1: Global smoothness • Egilmez et al. (2016) v 3 v 4 × = v 2 v 5 v 1 K = S − α 2 ( 11 T − I ) min tr( Θ K ) − log det Θ s.t. G χ c x Θ - Solve for as three di ff erent graph Laplacian matrices: Θ non-negative negative generalized diagonally dominant Laplacian generalized Laplacian Θ = L + V Θ = L + V = Deg − W + V ( V ≥ 0 ) = Deg − W + V 19 /34

Model 1: Global smoothness • Egilmez et al. (2016) v 3 v 4 × = v 2 v 5 v 1 K = S − α 2 ( 11 T − I ) min tr( Θ K ) − log det Θ s.t. G χ c x Θ - Solve for as three di ff erent graph Laplacian matrices: Θ non-negative negative generalized diagonally dominant combinatorial Laplacian generalized Laplacian Laplacian Θ = L + V Θ = L + V Θ = L = Deg − W = Deg − W + V ( V ≥ 0 ) = Deg − W + V 19 /34

Model 1: Global smoothness • Egilmez et al. (2016) v 3 v 4 × = v 2 v 5 v 1 K = S − α 2 ( 11 T − I ) min tr( Θ K ) − log det Θ s.t. G χ c x Θ - Solve for as three di ff erent graph Laplacian matrices: Θ non-negative negative generalized diagonally dominant combinatorial Laplacian generalized Laplacian Laplacian Θ = L + V Θ = L + V Θ = L = Deg − W = Deg − W + V ( V ≥ 0 ) = Deg − W + V Generalizes graphical LASSO and Lake Adding priors on edge weights leads to interpretation of MAP estimation 19 /34

Model 1: Global smoothness • Chepuri et al. (2016) v 3 v 4 × = v 2 v 5 v 1 G χ c x - An edge selection mechanism based on the same smoothness measure: v 1 v 4 v 3 v 2 v 5 v 6 v 7 v 8 v 9 20 /34

Model 1: Global smoothness • Chepuri et al. (2016) v 3 v 4 × = v 2 v 5 v 1 G χ c x - An edge selection mechanism based on the same smoothness measure: v 1 v 1 v 4 v 3 v 4 v 3 v 2 v 2 v 5 v 5 v 6 v 6 v 7 v 7 v 8 v 8 v 9 v 9 20 /34

Model 1: Global smoothness • Chepuri et al. (2016) v 3 v 4 × = v 2 v 5 v 1 G χ c x - An edge selection mechanism based on the same smoothness measure: v 1 v 1 v 1 v 4 v 3 v 4 v 3 v 4 v 3 v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 v 7 v 8 v 8 v 8 v 9 v 9 v 9 20 /34

Model 1: Global smoothness • Chepuri et al. (2016) v 3 v 4 × = v 2 v 5 v 1 G χ c x - An edge selection mechanism based on the same smoothness measure: v 1 v 1 v 1 v 4 v 3 v 4 v 3 v 4 v 3 v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 v 7 v 8 v 8 v 8 v 9 v 9 v 9 Similar in spirit to Dempster Good for learning unweighted graph Explicit edge-handler is desirable in some applications 20 /34

Model 2: Di ff usion process • Signals are outcome of some di ff usion processes on the graph (more of local smoothness than global one!) • Example: Movement of people/vehicles in transportation network 21 /34

Model 2: Di ff usion process • Signals are outcome of some di ff usion processes on the graph (more of local smoothness than global one!) • Example: Movement of people/vehicles in transportation network • Characterized by di ff usion operators v 1 v 3 v 4 v 2 v 5 v 6 v 7 v 8 v 9 initial stage 21 /34

Model 2: Di ff usion process • Signals are outcome of some di ff usion processes on the graph (more of local smoothness than global one!) • Example: Movement of people/vehicles in transportation network • Characterized by di ff usion operators v 1 v 3 v 4 v 2 v 5 observation v 6 heat di ff usion v 7 v 8 v 1 v 9 v 3 v 4 v 2 v 5 v 6 v 7 v 8 v 9 initial stage 21 /34

Model 2: Di ff usion process • Signals are outcome of some di ff usion processes on the graph (more of local smoothness than global one!) • Example: Movement of people/vehicles in transportation network • Characterized by di ff usion operators v 1 v 3 v 4 v 2 v 5 observation v 6 heat di ff usion v 7 v 8 v 1 v 9 v 3 v 4 v 2 v 5 v 6 v 7 v 8 v 1 v 9 v 3 v 4 general v 2 v 5 observation graph shift v 6 initial stage v 7 v 8 operator (e.g., A ) v 9 21 /34

Model 2: Di ff usion process • Pasdeloup et al. (2015, 2016) v 3 v 4 × = v 2 v 5 v 1 D ( G ) = T k ( m ) = W k ( m ) - W k G c x norm norm - are i.i.d. samples with independent entries { c m } 22 /34

Model 2: Di ff usion process • Pasdeloup et al. (2015, 2016) v 3 v 4 × = v 2 v 5 v 1 D ( G ) = T k ( m ) = W k ( m ) - W k G c x norm norm - are i.i.d. samples with independent entries { c m } - Two-step approach: - Estimate eigenvector matrix from sample covariance (if covariance unknown): M M h X ( m ) X ( m ) T i X X W 2 k ( m ) (polynomial of ) Σ = E = W norm norm m =1 m =1 22 /34

Model 2: Di ff usion process • Pasdeloup et al. (2015, 2016) v 3 v 4 × = v 2 v 5 v 1 D ( G ) = T k ( m ) = W k ( m ) - W k G c x norm norm - are i.i.d. samples with independent entries { c m } - Two-step approach: - Estimate eigenvector matrix from sample covariance (if covariance unknown): M M h X ( m ) X ( m ) T i X X W 2 k ( m ) (polynomial of ) Σ = E = W norm norm m =1 m =1 Optimize for eigenvalues given constraints of (mainly non-negativity of - W norm o ff -diagonal of and eigenvalue range) and some priors (e.g., sparsity) W norm 22 /34

Model 2: Di ff usion process • Pasdeloup et al. (2015, 2016) v 3 v 4 × = v 2 v 5 v 1 D ( G ) = T k ( m ) = W k ( m ) - W k G c x norm norm - are i.i.d. samples with independent entries { c m } - Two-step approach: - Estimate eigenvector matrix from sample covariance (if covariance unknown): M M h X ( m ) X ( m ) T i X X W 2 k ( m ) (polynomial of ) Σ = E = W norm norm m =1 m =1 Optimize for eigenvalues given constraints of (mainly non-negativity of - W norm o ff -diagonal of and eigenvalue range) and some priors (e.g., sparsity) W norm More a “graph-centric” learning framework: Cost function on graph components instead of signals 22 /34

Model 2: Di ff usion process • Segarra et al. (2016) v 3 v 4 × = v 2 v 5 L − 1 v 1 - X l D ( G ) = H ( S G ) = h l S G H ( S G ) G c x l =0 (di ff usion defined by a graph shift operator that can be arbitrary, but practically W or L ) S G c is a white signal - 23 /34

Model 2: Di ff usion process • Segarra et al. (2016) v 3 v 4 × = v 2 v 5 L − 1 v 1 - X l D ( G ) = H ( S G ) = h l S G H ( S G ) G c x l =0 (di ff usion defined by a graph shift operator that can be arbitrary, but practically W or L ) S G c is a white signal - - Two-step approach: - Estimate eigenvector matrix: Σ = HH T 23 /34

Model 2: Di ff usion process • Segarra et al. (2016) v 3 v 4 × = v 2 v 5 L − 1 v 1 - X l D ( G ) = H ( S G ) = h l S G H ( S G ) G c x l =0 (di ff usion defined by a graph shift operator that can be arbitrary, but practically W or L ) S G c is a white signal - - Two-step approach: - Estimate eigenvector matrix: Σ = HH T - Select eigenvalues that satisfy constraints of : S G N “spectral templates” X T S G , λ || S G || 0 S G = λ n v n v n min s.t. (eigenvectors) n =1 23 /34

Model 2: Di ff usion process • Segarra et al. (2016) v 3 v 4 × = v 2 v 5 L − 1 v 1 - X l D ( G ) = H ( S G ) = h l S G H ( S G ) G c x l =0 (di ff usion defined by a graph shift operator that can be arbitrary, but practically W or L ) S G c is a white signal - - Two-step approach: - Estimate eigenvector matrix: Σ = HH T - Select eigenvalues that satisfy constraints of : S G N “spectral templates” X T S G , λ || S G || 0 S G = λ n v n v n min s.t. (eigenvectors) n =1 Similar in spirit to Pasdeloup, same assumption on stationarity but di ff erent inference framework due to di ff erent D Can handle noisy or incomplete information on spectral templates 23 /34

Model 2: Di ff usion process • Thanou et al. (2016) v 3 v 4 × = v 2 v 5 v 1 - (localization in vertex domain) D ( G ) = e − τ L e − τ L G c x Sparsity assumption on c - 24 /34

Model 2: Di ff usion process • Thanou et al. (2016) v 3 v 4 × = v 2 v 5 v 1 - D ( G ) = e − τ L (localization in vertex domain) e − τ L G c x Sparsity assumption on c - - Each signal is a combination of several heat di ff usion processes at time τ 24 /34

Model 2: Di ff usion process • Thanou et al. (2016) v 3 v 4 × = v 2 v 5 v 1 - (localization in vertex domain) D ( G ) = e − τ L e − τ L G c x Sparsity assumption on c - - Each signal is a combination of several heat di ff usion processes at time τ M X L , C , τ || X − D ( L ) C || 2 || c m || 1 + β || L || 2 D = [ e − τ 1 L , ..., e − τ S L ] min F + α s.t. F m =1 24 /34

Model 2: Di ff usion process • Thanou et al. (2016) v 3 v 4 × = v 2 v 5 v 1 - (localization in vertex domain) D ( G ) = e − τ L e − τ L G c x Sparsity assumption on c - - Each signal is a combination of several heat di ff usion processes at time τ M X L , C , τ || X − D ( L ) C || 2 || c m || 1 + β || L || 2 D = [ e − τ 1 L , ..., e − τ S L ] min F + α s.t. F m =1 data fidelity sparsity on c regularization 24 /34

Model 2: Di ff usion process • Thanou et al. (2016) v 3 v 4 × = v 2 v 5 v 1 - (localization in vertex domain) D ( G ) = e − τ L e − τ L G c x Sparsity assumption on c - - Each signal is a combination of several heat di ff usion processes at time τ M X L , C , τ || X − D ( L ) C || 2 || c m || 1 + β || L || 2 D = [ e − τ 1 L , ..., e − τ S L ] min F + α s.t. F m =1 data fidelity sparsity on c regularization Still di ff usion-based model, but more “signal-centric” No assumption on eigenvectors/stationarity, but on signal structure and sparsity Can be extended to general polynomial case (Maretic et al. 2017) 24 /34

Model 3: Time-varying observations • Signals are time-varying observations that are causal outcome of current or past values (mixed degree of smoothness depending on previous states) • Example: Evolution of individual behavior due to influence of di ff erent friends at di ff erent timestamps • Characterized by an autoregressive model or a structural equation model (SEM) 25 /34

Model 3: Time-varying observations • Mei and Moura (2015) ⌘ ⇣ v 3 v 4 Σ S × = v 2 s =1 v 5 v 1 : polynomial of W of degree s - D s ( G ) = P s ( W ) x [ t − s ] G P s ( W ) x - Define as x [ t − s ] c s 26 /34

Model 3: Time-varying observations • Mei and Moura (2015) ⌘ ⇣ v 3 v 4 Σ S × = v 2 s =1 v 5 v 1 : polynomial of W of degree s - D s ( G ) = P s ( W ) x [ t − s ] G P s ( W ) x - Define as x [ t − s ] c s v 1 v 1 v 1 v 4 v 4 v 3 v 3 v 4 v 3 ... v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 + + v 7 v 8 v 8 v 8 = v 9 v 9 v 9 x [ t − 1] x [ t ] x [ t − S ] 26 /34

Model 3: Time-varying observations • Mei and Moura (2015) ⌘ ⇣ v 3 v 4 Σ S × = v 2 s =1 v 5 v 1 : polynomial of W of degree s - D s ( G ) = P s ( W ) x [ t − s ] G P s ( W ) x - Define as x [ t − s ] c s v 1 v 1 v 1 v 4 v 4 v 3 v 3 v 4 v 3 ... v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 + + v 7 v 8 v 8 v 8 = v 9 v 9 v 9 x [ t − 1] x [ t ] x [ t − S ] K S 1 X X P s ( W ) x [ k − s ] || 2 min || x [ k ] − 2 + λ 1 || vec( W ) || 1 + λ 2 || a || 1 2 W , a k = S +1 s =1 26 /34

Model 3: Time-varying observations • Mei and Moura (2015) ⌘ ⇣ v 3 v 4 Σ S × = v 2 s =1 v 5 v 1 : polynomial of W of degree s - D s ( G ) = P s ( W ) x [ t − s ] G P s ( W ) x - Define as x [ t − s ] c s v 1 v 1 v 1 v 4 v 4 v 3 v 3 v 4 v 3 ... v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 + + v 7 v 8 v 8 v 8 = v 9 v 9 v 9 x [ t − 1] x [ t ] x [ t − S ] K S 1 X X P s ( W ) x [ k − s ] || 2 min || x [ k ] − 2 + λ 1 || vec( W ) || 1 + λ 2 || a || 1 2 W , a k = S +1 s =1 data fidelity sparsity on W sparsity on a 26 /34

Model 3: Time-varying observations • Mei and Moura (2015) ⌘ ⇣ v 3 v 4 Σ S × = v 2 s =1 v 5 v 1 : polynomial of W of degree s - D s ( G ) = P s ( W ) x [ t − s ] G P s ( W ) x - Define as x [ t − s ] c s v 1 v 1 v 1 v 4 v 4 v 3 v 3 v 4 v 3 ... v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 + + v 7 v 8 v 8 v 8 = v 9 v 9 v 9 x [ t − 1] x [ t ] x [ t − S ] K S 1 X X P s ( W ) x [ k − s ] || 2 min || x [ k ] − 2 + λ 1 || vec( W ) || 1 + λ 2 || a || 1 2 W , a k = S +1 s =1 data fidelity sparsity on W sparsity on a Polynomial design similar in spirit to Pasdeloup and Segarra Good for inferring causal relations between signals Kernelized version (nonlinear): Shen et al. (2016) 26 /34

Model 3: Time-varying observations • Baingana and Giannakis (2016) v 3 v 4 + × = v 2 v 5 v 1 D ( G ) = W s ( t ) : Graph at time t - ext. G W x x (topologies switch at each time between S discrete states) Define c as x - 27 /34

Model 3: Time-varying observations • Baingana and Giannakis (2016) v 3 v 4 + × = v 2 v 5 v 1 D ( G ) = W s ( t ) : Graph at time t - ext. G W x x (topologies switch at each time between S discrete states) Define c as x - x [ t ] = W s ( t ) x [ t ] + B s ( t ) y [ t ] internal (neighbors) external 27 /34

Model 3: Time-varying observations • Baingana and Giannakis (2016) v 3 v 4 + × = v 2 v 5 v 1 : Graph at time t D ( G ) = W s ( t ) - ext. G W x x (topologies switch at each time between S discrete states) Define c as x - x [ t ] = W s ( t ) x [ t ] + B s ( t ) y [ t ] internal (neighbors) external Solve for all states of W : - T S 1 X || x [ t ] − W s ( t ) x [ t ] − B s ( t ) y [ t ] || 2 X λ s || W s ( t ) || 1 min F + 2 { W s ( t ) , B s ( t ) } t =1 s =1 data fidelity sparsity on W 27 /34

Model 3: Time-varying observations • Baingana and Giannakis (2016) v 3 v 4 + × = v 2 v 5 v 1 D ( G ) = W s ( t ) : Graph at time t - ext. G W x x (topologies switch at each time between S discrete states) Define c as x - x [ t ] = W s ( t ) x [ t ] + B s ( t ) y [ t ] internal (neighbors) external Solve for all states of W : - T S 1 X || x [ t ] − W s ( t ) x [ t ] − B s ( t ) y [ t ] || 2 X λ s || W s ( t ) || 1 min F + 2 { W s ( t ) , B s ( t ) } t =1 s =1 data fidelity sparsity on W Good for inferring causal relations between signals as well as dynamic topologies 27 /34

Comparison of di ff erent methods Methods Signal model Assumption Learning output Edge direction Inference Dong (2015) Global smoothness Gaussian Laplacian Undirected Signal-centric Kalofolias (2016) Global smoothness Gaussian Adjacency Undirected Signal-centric Generalized Egilmez (2016) Global smoothness Gaussian Undirected Signal-centric Laplacian Chepuri (2016) Global smoothness Gaussian Adjacency Undirected Graph-centric Normalized Adj./ Pasdeloup (2015) Di ff usion by Adj. Stationary Undirected Graph-centric Laplacian Di ff usion by Graph Graph shift Segarra (2016) Stationary Undirected Graph-centric shift operator operator Thanou (2016) Heat di ff usion Sparsity Laplacian Undirected Signal-centric Dependent on Mei (2015) Time-varying Adjacency Directed Signal-centric previous states Dependent on Time-varying Baingana (2016) Time-varying Directed Signal-centric current int/ext info Adjacency 28 /34

Learning graphs from data: A signal processing perspective Xiaowen - PowerPoint PPT Presentation

Learning graphs from data: A signal processing perspective Xiaowen Dong MIT Media Lab Graph Signal Processing Workshop Pittsburgh, PA, May 2017 Introduction What is the problem of graph learning? 2 /34 Introduction What is the

Algorithms for Lipschitz Learning on Graphs Sushant Sachdeva Yale Institute of Network Sciences

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Mining Data Graphs Semi-supervised learning, label propagation, Web Search Data graphs Data

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs

STACKED GRAPHS STACKED GRAPHS EVOLUTION OF STACKED GRAPHS Stacked Area Chart Themeriver

Graphs Graph definitions There are two kinds of graphs: directed graphs (sometimes called

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

House of Graphs: Introduction what are interesting graphs? GraPHedron First Definition of

A characterization of Krull monoids for which sets of lengths are arithmetical progressions W.A.

1-D and 2-D Parity FEC draft-ietf-fecframe-1d2d-parity-scheme-00 IETF 73 November 2008 Ali

14.581 International Trade Lecture 4: Assignment Models 14.581 Week 3 Spring 2013

THURSDAY Publish a character description 1 SPELLING We will be learning to spell: Words ending

On CDCL-based Proof Systems with the Ordered Decision Strategy Nathan Mull 1 , Shuo Pang 2 ,

The Rokhlin dimension of topological Z m -actions The structure and classification of nuclear

Distributed Power Management under Limited Communication Na Li Harvard University Rutgers,

A. V. Geramita and E. Carlini asked that their slides be combined into a single file, since their