Co-manifold learning with missing data Gal Mishne, Eric C. Chi and - PowerPoint PPT Presentation

Co-manifold learning with missing data Gal Mishne, Eric C. Chi and Ronald R. Coifman Department of Mathematics, Yale University Department of Statistics, North Carolina State University June 12, 2019 Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 1 / 14

The Biclustering Problem Task Given a data matrix X ∈ R n × p , find subgroups of rows & columns that go together. Text mining : similar documents share a small set of highly correlated words. Collaborative filtering : likeminded customers share similar preferences for a subset of products Cancer genomics : subtypes of cancerous tumors share similar molecular profiles over a subset of genes Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 2 / 14

Cancer Genomics Lung cancer is heterogenous at the molecular level Which genes are driving lung cancer? These genes are potential drug targets Collect expression data Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 3 / 14

Simple Solution: Cluster Dendrogram Each dendrogram is constructed independently of multiscale structure in other dimension. Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 4 / 14

From Co-clustering to Co-Manifold Learning I would add that in many real-world applications there is no “true” fixed number of biclusters, i.e. the truth is a bit more continuous... –Anonymous Referee 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 0.0 ● ● ● ● ● 0.2 ● ● ● ● Intrinsic Coordinate 2 Intrinsic Coordinate 2 ● ● ● ● ● − 0.1 ● 0.1 ● ● ● ● ● ● ● − 0.2 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● − 0.3 ● ● ● − 0.1 ● ● − 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 0.2 − 0.3 − 0.2 − 0.1 0.0 0.1 − 0.1 0.0 0.1 Intrinsic Coordinate 1 Intrinsic Coordinate 1 Clustered Dendrogram New Row Coordinate New Column Coordinate System System Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 5 / 14

What if data matrices are not completely observed? Missing data scenario Complete data: X ∈ R n × p Suppose we only get to observe Θ ⊂ { 1 , . . . , n } × { 1 , . . . , p } . Possibly by design: too expensive to collect / measure all np possible entries Goal: Recover row and column coordinate systems, not necessarily complete missing data 15 10 5 0 2 3 1 2 0 1 -1 0 -1 -2 ( X [ i, j ] ( i, j ) ∈ Θ y - helix X [ i, j ] = k y i � z j k 2 P Θ ( X ) = 0 otherwise z - 2D plane Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 6 / 14

Co-Manifold Learning Solve co-clustering-missing problem at multiple row and column scales Build multiscale row and column metrics Calculate non-linear embeddings Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 7 / 14

Step 1: Co-clustering an Incomplete Data Matrix U F ( U ) = 1 2 k P Ω ( X � U ) k 2 X X min F + γ c Ω ( k U · i � U · j k 2 ) + γ r Ω ( k U k · � U l · k 2 ) i < j k < l 1.0 0.8 0.6 0.4 0.2 0.0 − 2 − 1 0 1 2 Folded concave penalty = ) less bias towards 0 Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 8 / 14

Step 1: Majorization-Minimization (MM) G ( U | V ) = 1 2 k ˜ X � U k 2 X X F + γ c w c , ij k U · i � U · j k 2 + γ r ˜ w r , kl k U k · � U l · k 2 + c ˜ i < j k < l ˜ X = P Ω ( X ) + P Ω c ( V ) w c , ij = Ω 0 ( k V · i � V · j k 2 ) w r , kl = Ω 0 ( k V k · � V l · k 2 ) ˜ and ˜ Can be solved with Convex Bi-clustering [Chi et al. 2017]. 1.0 0.8 0.6 ● ● 0.4 0.2 0.0 − 2 − 1 0 1 2 Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 8 / 14

Step 1: Majorization-Minimization (MM) Majorization: G ( U | V ) = 1 X X 2 k X � U k 2 F + γ c w c , ij k U · i � U · j k 2 + γ r ˜ w r , kl k U k · � U l · k 2 + c ˜ i < j k < l F ( U ) = G ( U | U ) F ( U )  G ( U | V ) for all U MM: Solve sequence of Convex Biclustering Problems = arg min G ( U | U t ) U t +1 U Proposition Under suitable regularity conditions, the sequence U t generated by Algorithm 1 has at least one limit point, and all limit points are d-stationary points of minimizing F ( U ) . Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 9 / 14

Step 1: Smoothing Rows and Columns at Di ff erent Scale Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 10 / 14

Step 2: Multiscale metric Intuition: Pair of rows are close over multiple scale ! distance should be small Pair of rows are far apart over multiple scales ! distance should be big ( r , c ) = P Θ ( X ) + P Θ c ( U ( γ r , γ c )) Step 1: Fill in X over multiple γ r , γ c scales: ˜ X Step 2: Take weighted combination over all scales of pairwise distances ( r , c ) ( r , c ) X ( γ r γ c ) α k ˜ � ˜ d ( X i · , X j · ) = k 2 X X i · j · r , c α tunable to emphasize local versus global structure Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 11 / 14

Step 3: Spectral Embedding Example: Di ff usion Map (Coifman & Lafon, 2006) Construct an a ffi nity matrix exp { − d 2 ( X i · , X j · ) / σ 2 } A [ i , j ] = Compute row-stochastic matrix D − 1 A , X = D [ i , i ] = A [ i , j ] P j Eigendecomposition of P : keep first d eigenvalues and eigenvectors Mapping Ψ embeds the rows into the Euclidean space R d : � T . � Ψ : X i · → λ 1 ψ 1 ( i ) , λ 2 ψ 2 ( i ) , . . . , λ d ψ d ( i ) Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 12 / 14

Some Examples Nonlinear Linear Nonlinear Nonlinear Uncoupled Coupled Uncoupled Coupled Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 13 / 14

Some Examples Quantitative evaluation   1 via clustering 0.8 0.6 ARI 0.4 Lung500 0.2 0 10 20 30 40 50 60 70 80 90 percentage of missing values Co-manifold DM-missing NLPCA FRPCAG =1 FRPCAG =100 1 0.8 0.6 ARI 0.4 0.2 Linkage 0 10 20 30 40 50 60 70 80 90 percentage of missing values Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 14 / 14

Co-manifold learning with missing data Gal Mishne, Eric C. Chi and - PowerPoint PPT Presentation

Co-manifold learning with missing data Gal Mishne, Eric C. Chi and Ronald R. Coifman Department of Mathematics, Yale University Department of Statistics, North Carolina State University June 12, 2019 Gal Mishne (Yale) Co-Manifold Learning

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

Charting the Right Manifold: Manifold Mixup for Few-Shot Learning Puneet Mangla 1,2* , Mayank

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, 2011 L. Rosasco Manifold

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Searching for and replacing missing values Nicholas Tierney Statistician DataCamp Dealing With

Bayesian Generalized linear mixed models with data missing not at random Overview: Two simple

Manifold-driven spirals and rings Lia Athanassoula LAM, Marseille Lia Athanassoula Manifold

Manifold Construction and Parameterization for Nonlinear Manifold-Based Model Reduction Chenjie

A manifold structure on the set of functional observers Jochen Trumpf University of W urzburg

Efficient Krylov Approximation for Manifold Learning Shinjae Yoo Computational Science

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21

Clustering and information visualization Samuel Kaski University of Helsinki Department of

LIFE SCIENCES IN PARIS REGION PARIS AREA : FIRST EUROPEAN REGION IN THE FIELD OF LIFE SCIENCE AND

Introduction to K- means Dmitriy (Dima) Gorenshteyn Sr. Data Scientist, Memorial Sloan

Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Unsupervised

flowMatch Meta-clustering based popula3on matching Ariful Azad,

Curve Clustering and Functional Mixed Models. Modeling, variable selection and application to

A relative survival model for clustered responses - Comparing SAS PROC NLMIXED and WinBUGS for