constrained tucker decomposition
play

constrained Tucker decomposition Dongjin Choi and Lee Sael 1 / 24 - PowerPoint PPT Presentation

SNeCT: Integrative cancer data analysis via large scale network constrained Tucker decomposition Dongjin Choi and Lee Sael 1 / 24 Motivation Q: How can we characterize cancer patients? A: The Cancer Genome Atlas (TCGA) Pan-Cancer data


  1. SNeCT: Integrative cancer data analysis via large scale network constrained Tucker decomposition Dongjin Choi and Lee Sael 1 / 24

  2. Motivation  Q: How can we characterize cancer patients?  A: The Cancer Genome Atlas (TCGA) Pan-Cancer data provide rich data across 12 tumor types 12 tumor types Mary Goldman. UCSC Cancer Browser Workshop (2015) John N. Weinstein et al. Nat Genet 45(10), 1113-1120 (2013) doi:10.1038/ng.2764 2 / 27

  3. Motivation  How can we provide integrated analysis for multi- dimensional data?  Pan-Cancer12 data consist of multi-platform data Gene Expression DNA Methylation Copy Number Variation Mutation Mary Goldman. UCSC Cancer Browser Workshop (2015) 3 / 27

  4. Motivation  How can we build a combined model exploiting gene networks?  Gene association networks provide gene similarity information Common pathways John N. Weinstein et al. Nat Genet 45(10), 1113-1120 (2013) doi:10.1038/ng.2764 4 / 27

  5. Introduction Problem definition Proposed method Experiments Conclusion Overview  Introduction  Problem definition  Proposed method  Experiments  Conclusion 5 / 27

  6. Introduction Problem definition Proposed method Experiments Conclusion Tensor  A tensor is a multi-dimensional array  Pan-can12 data are represented as a 3-D tensor 0.12 -0.3 Patients 0.82 Observations Genes 6 / 27

  7. Introduction Problem definition Proposed method Experiments Conclusion Tensor Factorization  Given a tensor, decompose the tensor into a core tensor and factor matrices whose product approximates the original tensor CP Decomposition Tucker Decomposition (HOSVD) C C B B 𝒣 𝒣 ≈ ≈ 𝒴 𝒴 A A 7 / 27

  8. Introduction Problem definition Proposed method Experiments Conclusion Overview  Introduction  Problem definition  Proposed method  Experiments  Conclusion 8 / 27

  9. Introduction Problem definition Proposed method Experiments Conclusion Tucker Decomposition  Tucker decomposition (Tucker, 1966)  Widely-used tensor factorization method  Given a tensor, Tucker decomposition factorizes the tensor into product of a core tensor and orthogonal factor matrices 𝒴 ≈ ෪ 𝒴 = 𝒣 × 1 𝑩 × 2 𝑪 × 3 𝑫 C : s.t. 𝑩 𝑼 𝑩 = 𝑪 𝑼 𝑪 = 𝑫 𝑼 𝑫 = 𝑱 B 𝒣 ≈ Elementwise, 𝒴 A 𝑦 𝑗𝑘𝑙 ≈ 𝒣 × 1 𝒃 𝑗 × 2 𝒄 𝑘 × 3 𝒅 𝑙 𝒃 𝑗 : 𝑗 -th row of 𝑩 𝒄 𝑘 : 𝑘 -th row of 𝑪 𝒅 𝑙 : 𝑙 -th row of 𝑫 9 / 27

  10. Introduction Problem definition Proposed method Experiments Conclusion Tucker Decomposition (cont.)  Formal problem definition  Given a 3-D tensor 𝒴 (∈ ℝ 𝐽×𝐾×𝐿 ) with observable entries {𝑦 𝑗𝑘𝑙 |(𝑗, 𝑘, 𝑙) ∈ Ω 𝒴 } , the rank-[ 𝑄, 𝑅, 𝑆 ] factorization of 𝒴 is to find the core tensor 𝒣 and factor matrices {𝑩, 𝑪, 𝑫} which minimizes the following loss function: 𝑔 𝒣 , 𝑩, 𝑪, 𝑫 = 1 2 + 𝜇 2 𝒴 − ෪ 2 𝑆 𝒣 , 𝑩, 𝑪, 𝑫 𝒴 𝐺 = 1 2 + 𝜇 2 𝑆 𝒣 , 𝑩, 𝑪, 𝑫 ෍ 𝑦 𝑗𝑘𝑙 − 𝒣 × 1 𝒃 𝑗 × 2 𝒄 𝑘 × 3 𝒅 𝑙 2 𝑗,𝑘,𝑙 ∈Ω 𝒴 10 / 27

  11. Introduction Problem definition Proposed method Experiments Conclusion Overview  Introduction  Problem definition  Proposed method  Experiments  Conclusion 11 / 27

  12. Introduction Problem definition Proposed method Experiments Conclusion Scheme of SNeCT Input Lock-Free Parallel SGD Extract patients profile Gene 𝑩 𝑪 𝑫 Patient 𝒣 𝑫 𝒣 Gene 𝑩 𝑪 Gene Make related factors similar Bionetwork 𝑫 Personalized Subtype Analysis Prediction Stratification 𝒃 𝒓 C 1 ≈ 𝒃 𝒋 𝒣 𝑩 𝒣 × 𝟐 𝒯 = Query patient data 𝑪 C 2 𝑩 Top-k search Patients clustering 12 / 27

  13. Introduction Problem definition Proposed method Experiments Conclusion Proposed methods  SNeCT enables integrative tensor factorization and analysis for tensor data with network constraint SNeCT = Scalable Network Constrained Tucker decomposition  Method 1  Formulate SGD-amenable objective function  Iterative SGD update with lock-free parallel scheme  Method 2  Personalized subtype analysis 13 / 27

  14. Introduction Problem definition Proposed method Experiments Conclusion Proposed methods  Formulate SGD-amenable objective function  Given the gene similarity matrix 𝒁 (∈ ℝ 𝐾×𝐾 ) with observable entries {𝑧 𝑛𝑜 |(𝑛, 𝑜) ∈ Ω 𝒁 } , network constraint is formulated to make similar genes have similar factors: 𝑅 𝑕 𝑪, 𝒁 = 1 𝑧 𝑛𝑜 𝑐 𝑛𝑚 − 𝑐 𝑜𝑚 2 𝑔 2 ෍ ෍ 𝑚=1 𝑛,𝑜 ∈Ω 𝒁 = 1 2 ෍ 𝑧 𝑛𝑜 𝒄 𝑛 − 𝒄 𝑜 𝐺 2 𝑛,𝑜 ∈Ω 𝒁 14 / 27

  15. Introduction Problem definition Proposed method Experiments Conclusion Proposed methods  Formulate SGD-amenable objective function 𝑔 𝒣 , 𝑩, 𝑪, 𝑫 = 1 2 + 𝜇 2 𝑆 𝒣 , 𝑩, 𝑪, 𝑫 ෍ 𝑦 𝑗𝑘𝑙 − ෤ 𝑦 𝑗𝑘𝑙 2 𝑗,𝑘,𝑙 ∈Ω 𝒴 2 𝒄 𝑘 2 2 = 1 𝜇 𝒃 𝑗 + 𝒅 𝑙 2 + 2 + 𝜇 𝐺 𝐺 𝐺 ෍ 𝑦 𝑗𝑘𝑙 − ෤ 𝑦 𝑗𝑘𝑙 𝒣 𝐺 + 𝑗 𝑙 2 Ω 𝒴 𝑘 Ω 𝒴 Ω 𝒴 Ω 𝒴 𝑗,𝑘,𝑙 ∈Ω 𝒴 𝑕 𝑪, 𝒁 = 1 2 𝑔 ෍ 𝑧 𝑛𝑜 𝒄 𝑛 − 𝒄 𝑜 𝐺 2 𝑛,𝑜 ∈Ω 𝒁  Integrate into single objective function 𝑔 𝑝𝑞𝑢 = 𝑔 + 𝜇 𝑕 𝑔 𝑕 15 / 27

  16. Introduction Problem definition Proposed method Experiments Conclusion Proposed methods  Calculate gradients of 𝑔 𝑝𝑞𝑢 with respect to the core tensor and factor matrices for a given data point 𝑦 𝛽=(𝑗𝑘𝑙) or 𝑧 𝛾=(𝑛𝑜) 𝜖𝑔 𝜇 𝑝𝑞𝑢 ቤ = − 𝑦 𝛽 − ෤ 𝑦 𝛽 𝒣 × 2 𝒄 𝑘 × 3 𝒅 𝑙 + 𝒃 𝑗 𝑗 𝜖𝒃 𝑗 Ω 𝒴 𝛽 𝜖𝑔 𝜇 𝑈 × 2 𝒄 𝑘 𝑈 × 3 𝒅 𝑙 𝑝𝑞𝑢 𝑈 + ቤ = − 𝑦 𝛽 − ෤ 𝑦 𝛽 × 1 𝒃 𝑗 𝒣 𝜖 𝒣 Ω 𝒴 𝛽 𝜖𝑔 𝑝𝑞𝑢 ቤ = 𝜇 𝑕 𝑧 𝛾 𝒄 𝑛 − 𝒄 𝑜 𝜖𝒄 𝑛 𝛾 𝜖𝑔 𝜖𝑔 𝜖𝑔 𝑝𝑞𝑢 𝑝𝑞𝑢 𝑝𝑞𝑢 , and are calculated symmetrically ฬ , ฬ ฬ  𝜖𝒄 𝑘 𝜖𝒅 𝑙 𝜖𝒄 𝑜 𝛽 𝛾 𝛽 16 / 27

  17. Introduction Problem definition Proposed method Experiments Conclusion Proposed methods  Parallel update with calculated gradient  SNeCT( 𝒴 , 𝒁, 𝜇, 𝜇 𝑕 , 𝜃 ) ( 𝜃 : learning rate) Initialize 𝒣 , 𝑩, 𝑪, 𝑫 randomly 1. repeat 2. for ∀𝑦 (𝑗𝑘𝑙)=𝛽 ∈ 𝒴, ∀𝑧 𝑛𝑜 =𝛾 ∈ 𝒁 in random order in parallel 3. if 𝑦 𝑗𝑘𝑙 ∈ 𝒴 is picked then 4. 𝜖𝑔 𝜖𝑔 𝜖𝑔 𝑝𝑞𝑢 𝑝𝑞𝑢 , 𝒅 𝑙 ← 𝒅 𝑙 − 𝜃 𝑝𝑞𝑢 𝒃 𝑗 ← 𝒃 𝑗 − 𝜃 ฬ , 𝒄 𝑘 ← 𝒄 𝑘 − 𝜃 ฬ ฬ 5. 𝜖𝒃 𝑗 𝜖𝒄 𝑘 𝜖𝒅 𝑙 𝛽 𝛽 𝛽 𝜖𝑔 𝑝𝑞𝑢 𝒣 ← 𝒣 − 𝜃 ฬ 6. 𝜖 𝒣 𝛽 else if ∀𝑧 𝑛𝑜 ∈ 𝒁 is picked then 7. 𝜖𝑔 𝜖𝑔 𝑝𝑞𝑢 , 𝒄 𝑜 ← 𝒄 𝑜 − 𝜃 𝑝𝑞𝑢 𝒄 𝑛 ← 𝒄 𝑛 − 𝜃 ฬ ฬ 8. 𝜖𝒄 𝑛 𝛾 𝜖𝒄 𝑜 𝛾 end if 9. end for 10. 11. until convergence condition satisfied Orthogonalize 𝑩, 𝑪, 𝑫 by QR decomposition 12. 13. return 𝒣 , 𝑩, 𝑪, 𝑫 17 / 27

  18. Introduction Problem definition Proposed method Experiments Conclusion Overview  Introduction  Problem definition  Proposed method  Experiments  Conclusion 18 / 27

  19. Introduction Problem definition Proposed method Experiments Conclusion Experimental Settings  Factorize data tensor with rank-[78,48,5]  Stratification  Cluster analysis  Survival analysis  Prediction  T op-k similarity search on clinical features  Personalized subtype analysis  Performance  Compare speed and convergence rate with competitor  Competitor: Narita et al . 2012 19 / 27

  20. Introduction Problem definition Proposed method Experiments Conclusion Stratification – Cluster Analysis C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 T otal BLCA 16 32 2 19 0 22 3 0 0 0 32 0 0 126 BRCA 17 3 600 172 1 70 0 0 0 0 26 0 0 889 COAD 4 0 2 2 0 91 317 0 0 0 1 2 0 419 GBM 4 1 1 2 3 7 0 0 248 0 1 0 0 267 HNSC 0 242 1 6 0 1 0 0 0 0 60 0 0 310 KIRC 14 1 1 0 471 4 0 0 1 0 6 0 0 498 LAML 0 0 0 0 0 9 0 0 0 188 0 0 0 197 LUAD 302 2 2 7 1 12 0 0 0 0 29 0 0 457 LUSC 26 32 0 29 0 7 0 0 0 0 246 0 0 340 OV 0 0 1 3 0 1 1 348 0 0 0 0 131 485 READ 1 1 0 5 0 9 145 0 0 0 1 1 0 163 UCEC 3 1 3 117 1 348 1 0 0 0 10 13 2 499 T otal 387 315 613 362 477 581 467 348 249 188 412 17 134 4550 20 / 27

  21. Introduction Problem definition Proposed method Experiments Conclusion Stratification – Survival Analysis  Survival curves for clustered patients log-rank statistics: 1151 1185 409 21 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend