Stay on path: PCA along graph paths Megasthenis Asteris - PowerPoint PPT Presentation

Stay on path:   PCA along graph paths Megasthenis Asteris Electrical and Computer Engineering Anastasios Kyrillidis Alexandros Dimakis Han - Gyol Yi Communication Sciences and Disorders Bharath Chandrasekaran

Sparse PCA Direction of   x 2 maximum variance n observations / datapoints p variables Find new variable (feature) that   p captures most of the variance. y n x 1 . . . y 1

Sparse PCA Direction of   x 2 maximum variance n observations / datapoints p variables Find new variable (feature) that   p captures most of the variance. y n x 1 . . . y 1 Empirical   y > cov. matrix i n X b y i y > Σ = 1 y i k x k 2 = 1 n · i i =1

Sparse PCA Sparse direction of   maximum variance x 2 n observations / datapoints p variables Find new variable (feature) that   p captures most of the variance. y n x 1 . . . y 1 NP-Hard Empirical   y > cov. matrix i n X b y i y > Σ = 1 y i k x k 2 = 1 n · i i =1 k x k 0 = k

Sparse PCA Why sparsity? [ Engineer ] Extracted feature is more interpretable ; it depends on only a few original variables. [ Statistician ] Recovery of “true” PC in high dimensions; # observations << # variables.

Sparse PCA Why sparsity? More structure…? [ Engineer ] Extracted feature is more interpretable ; More interpretable . it depends on only a few original variables. [ Statistician ] Better sample complexity. Recovery of “true” PC in high dimensions; # observations << # variables. E.g. wavelets of natural images, block structures, periodical neuronal spikes, … [Baraniuk et al., 2008; Kyrillidis et al., 2014, Friedman et al., 2010, …]

Sparse PCA Why sparsity? More structure…? [ Engineer ] Extracted feature is more interpretable ; More interpretable . it depends on only a few original variables. [ Statistician ] Better sample complexity. Recovery of “true” PC in high dimensions; # observations << # variables. E.g. wavelets of natural images, block structures, periodical neuronal spikes, … [Baraniuk et al., 2008; Kyrillidis et al., 2014, Friedman et al., 2010, …] • Structured sparse PCA [Jenatton et al., 2010] - Sparsity-inducing norm - 2D grid, rectangular nonzero patterns

[ PCA On Graph Paths ]

Problem Definition • Structure captured by an underlying graph. Directed,   x 1 x 1 Acyclic x 2 x 3 . p . x i T S . x 2 x i . . . x p Active variables   on s ⤳ t path

Problem Definition • Structure captured by an underlying graph. Directed,   x 1 x 1 Acyclic x 2 x 3 . p . x i T S . x 2 x i . . . x p Active variables   on s ⤳ t path Graph Path   PCA

Motivation 1: Neuroscience - Variables: “ voxels” (points in the brain) - Measurements: blood-oxygen levels

Motivation 1: Neuroscience - Variables: “ voxels” (points in the brain) - Measurements: blood-oxygen levels T S

Motivation 2: Finance - Variables: stocks - Measurements: prices over time - Goal : Find subset that explains variance

Motivation 2: Finance - Variables: stocks divided in sectors - Measurements: prices over time - Goal : Find subset that explains variance 1 stock/ sector

Motivation 2: Finance - Variables: stocks divided in sectors - Measurements: prices over time - Goal : Find subset that explains variance 1 stock/ sector Chase BofA UBS

Motivation 2: Finance - Variables: stocks divided in sectors - Measurements: prices over time - Goal : Find subset that explains variance 1 stock/ sector BANKS Chase BofA UBS

Motivation 2: Finance - Variables: stocks divided in sectors - Measurements: prices over time - Goal : Find subset that explains variance 1 stock/ sector BANKS Chase Chevron Shell BofA UBS

Motivation 2: Finance - Variables: stocks divided in sectors - Measurements: prices over time - Goal : Find subset that explains variance 1 stock/ sector BANKS ENERGY Chase Chevron Shell BofA UBS

Motivation 2: Finance - Variables: stocks divided in sectors - Measurements: prices over time - Goal : Find subset that explains variance 1 stock/ sector BANKS ENERGY Chase Chevron Shell BofA UBS T S

[ Statistical Analysis ]

Data model (p,k,d)-layer graph 2 · · · 3 = d T S p = d 1 . . . . . . . . . . . . . . . p − 2 · · · p − 2 k k layers

Data model (p,k,d)-layer graph 2 · · · Target Source vertex vertex 3 = d T S p = d 1 . . . . . . . . . . . . . . . p − 2 · · · p − 2 k k layers

Data model layer (p,k,d)-layer graph 2 · · · Target Source vertex vertex 3 = d T S p = d 1 . . . . . . . . . . . . . . . p − 2 · · · p − 2 k k layers

Data model layer (p,k,d)-layer graph in & out degree 2 · · · Target Source vertex vertex 3 = d T S p = d 1 . . . . . . . . . . . . . . . p − 2 · · · p − 2 k k layers

Data model layer (p,k,d)-layer graph in & out degree 2 · · · Target Source vertex vertex 3 = d T S p = d 1 . . . . . . . . . . . . . . . p − 2 · · · p − 2 k k layers Spike along a path Gaussian p noise (i.i.d) � · u i · x ? + z i , y i = Samples Signal, supported on path of G.

Bounds [ Theorem 1 ] G (unknown) : -layer graph (known). : signal support on st-path of . ( p, k, d ) G x ? N ( 0 , β · x ? x > Observe sequence y 1 , . . . , y n of i.i.d. samples from . ? + I ) b b Σ x log p ⇣ ⌘ n = O k + k log d Then, samples suffice for recovery.

Bounds [ Theorem 1 ] G (unknown) : -layer graph (known). : signal support on st-path of . ( p, k, d ) G x ? N ( 0 , β · x ? x > Observe sequence y 1 , . . . , y n of i.i.d. samples from . ? + I ) b b Σ x k log p log p ⇣ ⌘ ⇣ ⌘ vs Ω n = O k + k log d Then, samples suffice for recovery. k for sparse PCA.

Bounds [ Theorem 1 ] G (unknown) : -layer graph (known). : signal support on st-path of . ( p, k, d ) G x ? N ( 0 , β · x ? x > Observe sequence y 1 , . . . , y n of i.i.d. samples from . ? + I ) b b Σ x k log p log p ⇣ ⌘ ⇣ ⌘ vs Ω n = O k + k log d Then, samples suffice for recovery. k for sparse PCA. [ Theorem 2 ] That many samples are also necessary .

Bounds [ Theorem 1 ] G (unknown) : -layer graph (known). : signal support on st-path of . ( p, k, d ) G x ? N ( 0 , β · x ? x > Observe sequence y 1 , . . . , y n of i.i.d. samples from . ? + I ) NP-HARD b b Σ x k log p log p ⇣ ⌘ ⇣ ⌘ vs Ω n = O k + k log d Then, samples suffice for recovery. k for sparse PCA. [ Theorem 2 ] That many samples are also necessary .

Algorithms

Algorithm 1 A Power Method-based approach. Input: init x 0 , i ← 0 w i ← b Σ x i Power Iteration   with projection   step. End? b x ← x i +1

[ Projection Step ] Project a p-dimensional on w x ∈ X ( G ) k x � w k 2 arg min T S

[ Projection Step ] Project a p-dimensional on w x ∈ X ( G ) k x � w k 2 arg min Due to the   constraints. T S

[ Projection Step ] Project a p-dimensional on w x ∈ X ( G ) k x � w k 2 arg min Due to the   constraints. T S Due to   Cauchy -Schwarz

[ Projection Step ] Project a p-dimensional on w x ∈ X ( G ) k x � w k 2 arg min Due to the   constraints. T S Due to   Cauchy -Schwarz Longest (weighted) path   problem on G, with   G acyclic; special weights!

[ Experiments ]

Synthetic Data generated according to the (p,k,d)-layer graph model. (p=1000, k=50, d=10 , 100 MC iterations) 1.4 Trunc. Power M. Span. k -sparse 1.2 Graph Power M. Low-D Sampling 1 x > ! xx > k F 0.8 0.6 x b k b 0.4 0.2 0 1000 2000 3000 4000 5000 Samples n

Neuroscience • Resting state fMRI dataset.* • 111 regions of interest (ROIs) (variables), extracted based on Harvard-Oxford Atlas [Desikan et al., 2006]. • Graph extracted based on Euclidean distances between center of mass of ROIs. Identified core neural components of the brain’s memory network. *[Human Connectome Project, WU-Minn Consortium]

Stay on path: PCA along graph paths Megasthenis Asteris - PowerPoint PPT Presentation

Stay on path: PCA along graph paths Megasthenis Asteris Electrical and Computer Engineering Anastasios Kyrillidis Alexandros Dimakis Han - Gyol Yi Communication Sciences and Disorders Bharath Chandrasekaran Sparse PCA Direction of x

STAY HOME | STAY HEALTHY | STAY CONNECTED | RETURN STRONGER STAY HOME. STAY HEALTHY. STAY

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

"Interesting" Paths = Shortest Paths? "Interesting" Paths Shortest Paths!

More On Paths Supplement to Chapter 4, Graph Theory Path definition What is a path? We

Three Graph Algorithms Shortest Distance Paths Distance/Cost of a path in weighted graph sum of

Shortest Paths Shortest Paths path between two given vertices path between two given vertices

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Finding Tutte Paths in Linear Time Philipp Kindermann Universit at W urzburg joint work

Oh, Wont You Stay? Oh, Wont You Stay? Oh, Won t You Stay? Oh, Won t You Stay? Predictors

Maders Theorem on Edge-Disjoint -Paths Satoru Iwata (University of Tokyo) Joint work with Yu

Finding Shortest Paths Shortest Path Problem Shortest Path Problem We are given a graph G = ( V ,

Finding Shortest Paths Shortest Path Problem Shortest Path Problem Given a graph G = ( V , E )

Graphs II - Shortest paths Single Source Shortest Paths All Sources Shortest Paths some drawings

Current Flight Paths Current Flight Paths Current approach and departure paths are all over

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Linux file paths (Nearly?) anyplace you can specify a file or directory you can also include

Groundwater Remediation in Groundwater Remediation in d d di di i i i i EPAs Superfund

Decision on the Delaney-Colorado River Transmission Project Neil Millar Executive Director,

what is is an I.I.D. a Marketing Intelligent, Interactive Display, Marketing Cloud platform that

I N V E S TO R P R E S E NTATION N O V E M B E R 2 0 1 7 N YS E : CIO F ORWARD -L OOKING S

Chapter 2 Linear Ill-Posed Problems Observations from previous chapter Ill-Posed Problems in

Lecture 10: L inear Inverse Heat Conduction Problems Two basic examples Yvon JARNY, Denis MAILLET

1 f d 2016 10 2017 2 2 2 High Performance Comp. 3 JO,

SYSTEM IDENTIFICATION AND MODEL UPDATING STUDIES IWSHM Derek Skolnik 2007 Ertugrul Taciroglu,

Stay on path: PCA along graph paths Megasthenis Asteris - PowerPoint PPT Presentation

Stay on path: PCA along graph paths Megasthenis Asteris Electrical and Computer Engineering Anastasios Kyrillidis Alexandros Dimakis Han - Gyol Yi Communication Sciences and Disorders Bharath Chandrasekaran Sparse PCA Direction of x

STAY HOME | STAY HEALTHY | STAY CONNECTED | RETURN STRONGER STAY HOME. STAY HEALTHY. STAY

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

&quot;Interesting&quot; Paths = Shortest Paths? &quot;Interesting&quot; Paths Shortest Paths!

More On Paths Supplement to Chapter 4, Graph Theory Path definition What is a path? We

Three Graph Algorithms Shortest Distance Paths Distance/Cost of a path in weighted graph sum of

Shortest Paths Shortest Paths path between two given vertices path between two given vertices

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Finding Tutte Paths in Linear Time Philipp Kindermann Universit at W urzburg joint work

Oh, Wont You Stay? Oh, Wont You Stay? Oh, Won t You Stay? Oh, Won t You Stay? Predictors

Maders Theorem on Edge-Disjoint -Paths Satoru Iwata (University of Tokyo) Joint work with Yu

Finding Shortest Paths Shortest Path Problem Shortest Path Problem We are given a graph G = ( V ,

Finding Shortest Paths Shortest Path Problem Shortest Path Problem Given a graph G = ( V , E )

Graphs II - Shortest paths Single Source Shortest Paths All Sources Shortest Paths some drawings

Current Flight Paths Current Flight Paths Current approach and departure paths are all over

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Linux file paths (Nearly?) anyplace you can specify a file or directory you can also include

Groundwater Remediation in Groundwater Remediation in d d di di i i i i EPAs Superfund

Decision on the Delaney-Colorado River Transmission Project Neil Millar Executive Director,

what is is an I.I.D. a Marketing Intelligent, Interactive Display, Marketing Cloud platform that

I N V E S TO R P R E S E NTATION N O V E M B E R 2 0 1 7 N YS E : CIO F ORWARD -L OOKING S

Chapter 2 Linear Ill-Posed Problems Observations from previous chapter Ill-Posed Problems in

Lecture 10: L inear Inverse Heat Conduction Problems Two basic examples Yvon JARNY, Denis MAILLET

1 f d 2016 10 2017 2 2 2 High Performance Comp. 3 JO,

SYSTEM IDENTIFICATION AND MODEL UPDATING STUDIES IWSHM Derek Skolnik 2007 Ertugrul Taciroglu,

"Interesting" Paths = Shortest Paths? "Interesting" Paths Shortest Paths!