stochastic iterative hard thresholding for graph
play

Stochastic Iterative Hard Thresholding for Graph-Structured Sparsity - PowerPoint PPT Presentation

Stochastic Iterative Hard Thresholding for Graph-Structured Sparsity Optimization Baojian Zhou 1 , Feng Chen 1 , and Yiming Ying 2 1 Department of Computer Science, 2 Department of Mathematics and Statistics, University at Albany, NY, USA


  1. Stochastic Iterative Hard Thresholding for Graph-Structured Sparsity Optimization Baojian Zhou 1 , Feng Chen 1 , and Yiming Ying 2 1 Department of Computer Science, 2 Department of Mathematics and Statistics, University at Albany, NY, USA 06/13/2019 Poster # 92 1 / 7

  2. Graph structure information Current limitations: Our goals propose/provide: • only focus on specific loss as a prior often have: • an algo. for general loss • better classification, • expensive full-gradient under stochastic setting regression performance calculation • convergence analysis • stronger interpretation • cannot handle complex • real-world applications structure Structured sparse learning Given M ( M ) = { w : supp( w ) ∈ M } , the structured sparse learning problems can be formulated as n w ∈M ( M ) F ( w ) := 1 � min f i ( w ) , where n w 6 w 6 i =1 w 5 w 4 w 4 w 5 w 1 w F ( w ) is a convex loss such as least square, logistic loss, . . . w 3 w 2 w 3 w 2 M ( M ) models structured sparsity such as connected subgraphs, w 1 G dense subgraphs, and subgraphs isomophic to a query graph, . . . 1 2 / 7

  3. Inspired by two recent works Hegde et al. (2016); Nguyen et al. (2017) Algorithm 1 GraphStoIHT w 6 w 6 w 6 w 6 1: Input : η t , F ( · ) , M H , M T w 5 2: Initialize : w 0 and t = 0 w 4 w 4 w 5 w 1 w 4 w 5 w 1 w 4 w 5 w 1 w 3 3: for t = 0 , 1 , 2 , . . . do w 2 w 3 w 2 w 3 w 2 w 3 w 2 Choose ξ t from [ n ] with prob. p ξ t 4: b t = P ( ∇ f ξ t ( w t ) , M H ) w 1 5: � w t +1 = P ( w t − η t b t , M T ) 6: Weighted Graph Model 7: end for M = { S : | S | ≤ 3 , S is connected } (Hegde et al., 2015a) 8: Return w t +1 Orthogonal Projection Operator P ( · , M ) : Two differences from StoIHT : R p → R p defined as • project the gradient ∇ f ξ t ( · ) • projects the proxy onto M ( M T ). � w − w ′ � 2 P ( w , M ) = arg min Why projection b t = P ( ∇ f ξ t ( w t ) , M H ) ? w ′ ∈M ( M ) • Both of them solve the same projection problem • s -sparse set • Intuitively, sparsity is both in primal and dual space • Weighted Graph Model • Remove some noisy directions at the first stage 2 3 / 7

  4. Two assumptions in M ( M ): 2 � w − w ′ � 2 f i ( w ): β -Restricted Strong Smoothness 1 β F ( w ): α -Restricted Strong Convexity ) ′ ( w , w Efficient Approximated projections: 2 f B α • P ( · , M H ) with approximation factor c H 2 � w − w ′ � 2 • P ( · , M T ) with approximation factor c T B f ( w , w ′ ) = f ( w ) − f ( w ′ ) − �∇ f ( w ′ ) , w − w ′ � Theorem 1 ( Linear Convergence) Let w 0 be the start point and choose η t = η , then w t +1 of Algorithm 1 satisfies σ E ξ [ t ] � w t +1 − w ∗ � ≤ κ t +1 � w 0 − w ∗ � + 1 − κ, where � �� � αβη 2 − 2 αη + 1 + αβτ 2 − 2 ατ + 1 , β 0 = (1 + c H ) τ � 1 − α 2 κ = (1 + c T ) , α 0 = c H ατ − 0 � � β 0 α 0 β 0 E ξ t �∇ I f ξ t ( w ∗ ) � + η E ξ t �∇ I f ξ t ( w ∗ ) � , and η, τ ∈ (0 , 2 /β ) . σ = + α 0 � 1 − α 2 0 3 4 / 7

  5. Graph Linear Regression Contraction factor w ∗ : y = Xw ∗ + ǫ X ∈ R m × p , ǫ ∼ N ( 0 , I m ) Algorithm κ Consider the least square loss � √ δ + 2 √ 1 − δ � √ GraphIHT (1 + c T ) δ 1+ δ + 2 √ � √ �� 2(1 − δ ) 2 (1 + c T ) δ GraphStoIHT n 1+ δ F ( w ) := 1 n � 2 m � X B i w − y B i � 2 . arg min • For GraphIHT , δ ≤ 0 . 0527 n supp( w ) ∈M ( M ) i =1 • For GraphStoIHT , δ ≤ 0 . 0142 Graph Logistic Regression If x i is normalized, then F ( w ) satisfies λ -RSC and each f i ( w ) satisfies ( α + (1 + w ∗ : (1 + e − y i ·� w ∗ , x i � ) − 1 x i ∈ R p , y i ∈ { +1 , − 1 } ν ) θ max )-RSS. The condition of κ < 1 is λ + n (1 + ν ) θ max / 4 m ≥ 243 λ Consider the logistic loss 250 , m / n n F ( w ) := 1 n h ( w , i j )+ λ � � 2 � w � 2 , arg min with prob. 1 − p exp ( − θ max ν/ 4) , where n m supp( w ) ∈M ( M ) i =1 j =1 θ max = λ max ( � m / n j =1 E [ x i j x T i j ]) and ν ≥ 1. where h ( w , i j ) = log(1 + exp ( − y i j · � x i j , w � )). 4 5 / 7

  6. BackGround Angio Text Simulation Dataset each entry √ m X ij ∼ N (0 , 1) NIHT IHT supp( w ∗ ) is generated by random walk StoIHT Entries of w ∗ from N (0 , 1) CoSaMP GraphIHT Weighted Graph Model (Hegde et al., 2015b) GraphCoSaMP GraphStoIHT η = 0 . 1 GraphStoIHT GraphStoIHT η = 0 . 2 1 . 0 10 0 10 0 Probability of Recovery b = 1 η = 0 . 3 b = 2 η = 0 . 4 0 . 8 10 − 2 10 − 2 b = 4 η = 0 . 5 b = 8 η = 0 . 6 b = 16 η = 0 . 7 10 − 4 10 − 4 0 . 6 x � � x − ˆ b = 24 η = 0 . 8 b = 32 η = 0 . 9 10 − 6 10 − 6 0 . 4 b = 40 η = 1 . 0 b = 48 η = 1 . 1 b = 56 η = 1 . 2 10 − 8 10 − 8 0 . 2 b = 64 η = 1 . 3 b = 180 η = 1 . 4 η = 1 . 5 0 . 0 0 5 10 15 20 25 0 300 600 900 η = 1 . 6 Epoch Iteration 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 Oversampling ratio m/s Oversampling ratio m/s Oversampling ratio m/s Breast Cancer Dataset 295 samples with 78 positives (metastatic) � w t � 0 Algorithm Cancer related genes AUC and 217 negatives (non-metastatic) provided GraphStoIHT BRCA2, CCND2, CDKN1A, ATM, AR, TOP2A 051.7 0.715 GraphIHT ATM, CDKN1A, BRCA2, AR, TOP2A 055.2 0.714 in (Van De Vijver et al., 2002). ℓ 1 - Path BRCA1, CDKN1A, ATM, DSC2 061.2 0.675 MKI67, NAT1, AR, TOP2A 059.6 0.708 PPI network with 637 pathways is provided StoIHT ℓ 1 /ℓ 2 - Edge CCND3, ATM, CDH3 051.4 0.705 in (Jacob et al., 2009). We restrict our ℓ 1 - Edge CCND3, AR, CDH3 039.9 0.698 analysis on 3,243 genes (nodes) with 19,938 ℓ 1 /ℓ 2 - Path BRCA1, CDKN1A 147.6 0.705 edges. These cancer-related genes form a IHT NAT1, TOP2A 067.9 0.707 connected subgraph. 5 6 / 7

  7. See you at Poster #92 Thank you! 7 / 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend