 
              . . . . . . . . . . . . . . . . . Graph Refjnement for Clustering Zhenyue Zhang Zhejiang University Jointed work with Limin Li, Jiayun Mao, Zheng Zhai . . . . . . . . . . . . . . . . . . . . . . . MLA 2017 · Beijing Jiaotong University
. . . . . . . . . . . . . . . . . . Graphs The roles of graphs . . . . . . . . . . . . . . . . . . . . . . • feature selection, dimensionality reduction, • clustering • smart messaging as Allo (Google) • lot of applications ...
. . . . . . . . . . . . . . . . . . Many graph-based methods sufger from graph noise because of . . . . . . . . . . . . . . . . . . . . . . • incorrect connections or weights, • missing information • noisy data if graphs are constructed from data points • unsuitable measurement used for graph construction • confmicting information from multi-view data sets, view distortion • difgerent magnitude, neighborhoods, distribution, and noise process • difgerent view-specifjc graphs
. . . . . . . . . . . . . . . . . . Graph modifjcation . . . . . . . . . . . . . . . . . . . . . . • data cleaning • graph approximation in a special form • graph fusion for multi-view learning • graph coarsening
. . . . . . . . . . . . . . . . . . We will talk about the three issues for graph modifjcation: . . . . . . . . . . . . . . . . . . learning (SSC, LRR), multi-view learning (CRCS,MKkC) . . . . • Uniform feature selection/projection for multi-view data • UMCD for multiple dissimilarity matrices/kernels • UCA for multiple similarity matrices • Uniform neighborhood graph from multi-view data • Construct a uniform sparse graph for all views • Modify view-specifjc graphs for multi-view learning methods • Graph refjnement • Improve methods in manifold learning (LLE, LE, LPP), subspace
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part I: Uniform Feature Selection
. . . . . . . . . . . . . . . . . Multi-view observations (column vectors) x v cellular environments, or the status of its somatic mutation in difgerent tumors. . . . . . . . . . . . . . . . . . . . . . . . i ∈ X v ⊂ R d v , i = 1 , · · · , n , v = 1 , · · · , m • Webpages: contents or hyperlinks of contents • Multiple-language environment: documents in multiple languages • Images: pixels or text captions (labels) • Publications: contents (key words), and citations • Gene representations: gene sequences, expressions in difgerent • ....
. . . . . . . . . . . . . . . . View distortion Question: 1. Can we simulate view distortion in term of latent ”uniform true features”? 2. How to retrieve the features from multiple noisy graphs . . . . . . . . . . . . . . . . . . . . . . . . approximately?
. . . . . . . . . . . . . . . . . Given observed vectors x v distortion as a nonlinear mapping of the noisy features, x v . . . . . . . . . . . . . . . . . . . . . . . i ∈ X v ⊂ R d v in view v , we model the view i = 1 , · · · , n , i = f v ( y i , ϵ v i ) , • f v : view-specifjc distortion function • { y i } : uniform latent features in a low-dimensional space • { ϵ v i } : view-specifjc noise vectors
. . . . . . . . . . . . . . . . A simple form x v x v g v : a nonlinear mapping, G v : an affjne transformation . x v . v . . . . . . . . . . . . . . . . . . . . . . i = g v ( G v y i + ϵ v i ) , y i ∈ R d , or i = ( ϕ v ◦ g v )( G v y i + ϵ v i ) 2 2 2 1.8 1.8 1.8 1 1.6 1.6 1.6 0.8 1.4 1.4 1.4 0.6 1.2 1.2 1.2 0.4 1 1 1 0.2 0 0.8 0.8 0.8 − 0.2 0.6 0.6 0.6 1 0.4 0.4 0.4 0.8 0.5 0.6 0.2 0.2 0.2 0.4 0 0.2 0 0 0 0 − 0.2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 3.5 4 Figure: Left: Intact 3D samples { y j } . The right three { x v j } j = exp( G v y j + ϵ j ) , G v = DQ T
. . . . . . . . . . . . . . . . Two models v F (1) n ee T . . . . . . . . . . . . . . . . . . . . . . . . Model I: UMDS for multiple squared dissimilarity matrices { D v } ∑ min ∥ A v − Y T W v Y ∥ 2 F / ∥ A v ∥ 2 YY T = I , { W v } The input matrices { A v } could be • A v = − 1 2 HD v H for a squared dissimilarity matrix D v in view v , where H = I − 1 • A v = HK v H for a kernel K v
• S v v S B v , B v : view-deviation • Factorization: S • minimize the deviation blocks B v . UDU T , . . . . . v v (2) Basic idea: B v . U U B v 11 B v 12 B v 21 B v 22 U U T 12 , B v 21 , and B v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 . Model II: UCA for multiple similarity matrices { S v } : ∑ τ − 2 � � max F . � U T S v U � 2 U T U = I where S v is a view-specifjc similarity matrix in view v .
. v . . . . . . . . . . v . (2) Basic idea: 11 B v 12 B v 21 B v 22 12 , B v 21 , and B v . . . . . . . . . . . . . . . . . 22 . . . . . . . . . . . Model II: UCA for multiple similarity matrices { S v } : ∑ τ − 2 � � max F . � U T S v U � 2 U T U = I where S v is a view-specifjc similarity matrix in view v . • S v = τ v ( S + B v ) , B v : view-deviation • Factorization: S = UDU T , ( B v ) B v = ( U , U ⊥ ) ( U , U ⊥ ) T • minimize the deviation blocks B v
eigenvalue problem with C U v A v UU T A v • Eigen-subspace iteration (small scale), or • Subspace extension (large scale) . . . . . . . . . . The uniform model: Equivalences . v (3) KKT condition (fjrst order necessary condition): U solves the nonlinear C U U U U T U I It can be solved by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ∑ � � max F . � U T A v U � 2 U T U = I
• Eigen-subspace iteration (small scale), or • Subspace extension (large scale) . . . . . . . . . . . . . . . Equivalences The uniform model: v (3) It can be solved by . . . . . . . . . . . . . . . . . . . . . . . . . ∑ � � max F . � U T A v U � 2 U T U = I KKT condition (fjrst order necessary condition): U solves the nonlinear eigenvalue problem with C ( U ) = ∑ v A v UU T A v C ( U ) U = U Λ , U T U = I .
. . . . . . . . . . . . . . . . Equivalences The uniform model: v (3) It can be solved by . . . . . . . . . . . . . . . . . . . . . . . . ∑ � � max F . � U T A v U � 2 U T U = I KKT condition (fjrst order necessary condition): U solves the nonlinear eigenvalue problem with C ( U ) = ∑ v A v UU T A v C ( U ) U = U Λ , U T U = I . • Eigen-subspace iteration (small scale), or • Subspace extension (large scale)
. . . . . . . . . . . . . . . . . . Convergence: . . . . . . . . . . . . . . . . . . . . . . (a) { f ( U ℓ ) } is convergent. (b) Any accumulation point U ∗ satisfjes C ∗ U ∗ = U ∗ Λ ∗ (c) If λ d ( C ∗ ) > λ d + 1 ( C ∗ ) , then { P ℓ + 1 − P ℓ } tends to zero. (d) P ℓ → P ∗ if { P ℓ } has an isolated accumulation point P ∗ .
. . . . . . . . . . . . . . . . . Synthetic data x v DQ T j . . . . . . . . . . . . . . . . . . . . . . . ( ) j = exp v y j + ϵ v , y j ∈ R 3 , • Each Q v has two orthonormal columns • D = diag ( 1 , s ) with s ∈ ( 0 , 1 ) : measuring singularity • ε v j ∼ N ( 0 , σ ) , σ : noise level
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d = 4, σ = 0.084211 d = 4, σ = 0.13684 d = 4, s = 0 d = 4, s = 0.8 1 1 1 1 0.98 0.98 0.98 0.98 0.96 0.96 0.96 0.96 0.94 0.94 0.94 0.94 MDS 0.92 MDS 0.92 0.92 0.92 UMDS UMDS 0.9 0.9 0.9 0.9 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2 s s σ σ d = 6, σ = 0.084211 d = 6, σ = 0.13684 d = 6, s = 0 d = 6, s = 0.8 1 1 1 1 0.98 0.98 0.98 0.98 0.96 0.96 0.96 0.96 0.94 0.94 0.94 0.94 0.92 0.92 0.92 0.92 0.9 0.9 0.9 0.9 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2 s s σ σ
. . . . . . . . . . . . . . . . Real-word data French, German, Spanish, and Italian average of local pixels in a 2 x 3 window across course, project, student, faculty or stafg BBCsports on athletics, cricket, football, rugby, or tennis . . . . . . . . . . . . . . . . . . . . . . . . • News stories in six topics from BBC, Reuters, and Guardian • Reuters Multilingual data: documents over 6 categories written in English, • UCIDigit: hand written digits in Fourier coeffjcient, profjle correlation, and • Webpages on Cornell, Texas, Washington, Wisconsin in content or link, • BBCnews on business, entertainment, politics, sport, tech • Cora: research papers (absence/presence or link) in 7 classes
Recommend
More recommend