1 / 24
constrained Tucker decomposition Dongjin Choi and Lee Sael 1 / 24 - - PowerPoint PPT Presentation
constrained Tucker decomposition Dongjin Choi and Lee Sael 1 / 24 - - PowerPoint PPT Presentation
SNeCT: Integrative cancer data analysis via large scale network constrained Tucker decomposition Dongjin Choi and Lee Sael 1 / 24 Motivation Q: How can we characterize cancer patients? A: The Cancer Genome Atlas (TCGA) Pan-Cancer data
Motivation
Q: How can we characterize cancer patients?
A: The Cancer Genome Atlas (TCGA) Pan-Cancer data
provide rich data across 12 tumor types
2 / 27
Mary Goldman. UCSC Cancer Browser Workshop (2015) 12 tumor types John N. Weinstein et al. Nat Genet 45(10), 1113-1120 (2013) doi:10.1038/ng.2764
Motivation
How can we provide integrated analysis for multi-
dimensional data?
Pan-Cancer12 data consist of multi-platform data
3 / 27
Gene Expression DNA Methylation Copy Number Variation Mutation
Mary Goldman. UCSC Cancer Browser Workshop (2015)
Motivation
How can we build a combined model exploiting gene
networks?
Gene association networks provide gene similarity information
4 / 27
John N. Weinstein et al. Nat Genet 45(10), 1113-1120 (2013) doi:10.1038/ng.2764 Common pathways
Overview
Introduction Problem definition Proposed method Experiments Conclusion
5 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Tensor
A tensor is a multi-dimensional array Pan-can12 data are represented as a 3-D tensor
6 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Patients 0.12
- 0.3
0.82 Observations Genes
Tensor Factorization
Given a tensor, decompose the tensor into a core tensor
and factor matrices whose product approximates the
- riginal tensor
7 / 27
Introduction Problem definition Proposed method Experiments Conclusion
CP Decomposition Tucker Decomposition (HOSVD)
A
≈ ≈
B
𝒴 𝒴
C A B C
Overview
Introduction Problem definition Proposed method Experiments Conclusion
8 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Tucker Decomposition
9 / 27
Tucker decomposition (Tucker, 1966)
Widely-used tensor factorization method Given a tensor, Tucker decomposition factorizes the tensor into
product of a core tensor and orthogonal factor matrices
Introduction Problem definition Proposed method Experiments Conclusion
:
𝒴 ≈ ෪ 𝒴 = ×1 𝑩 ×2 𝑪 ×3 𝑫 Elementwise, 𝑦𝑗𝑘𝑙 ≈ ×1 𝒃𝑗 ×2 𝒄𝑘 ×3 𝒅𝑙 𝒃𝑗: 𝑗-th row of 𝑩 𝒄𝑘: 𝑘-th row of 𝑪 𝒅𝑙: 𝑙-th row of 𝑫
≈
𝒴
A B C
s.t. 𝑩𝑼𝑩=𝑪𝑼𝑪=𝑫𝑼𝑫=𝑱
Tucker Decomposition (cont.)
10 / 27
Formal problem definition
Given a 3-D tensor𝒴 (∈ ℝ𝐽×𝐾×𝐿) with observable entries
{𝑦𝑗𝑘𝑙|(𝑗, 𝑘, 𝑙) ∈ Ω𝒴}, the rank-[𝑄, 𝑅, 𝑆] factorization of 𝒴 is to find the core tensor and factor matrices {𝑩, 𝑪, 𝑫} which minimizes the following loss function:
Introduction Problem definition Proposed method Experiments Conclusion
𝑔 ,𝑩, 𝑪, 𝑫 = 1 2 𝒴 − ෪ 𝒴
𝐺 2 + 𝜇
2 𝑆 ,𝑩, 𝑪, 𝑫 = 1 2
𝑗,𝑘,𝑙 ∈Ω𝒴
𝑦𝑗𝑘𝑙 − ×1 𝒃𝑗 ×2 𝒄𝑘 ×3 𝒅𝑙
2 + 𝜇
2 𝑆 ,𝑩, 𝑪, 𝑫
Overview
Introduction Problem definition Proposed method Experiments Conclusion
11 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Scheme of SNeCT
12 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Lock-Free Parallel SGD Input Extract patients profile Stratification
𝑩
Patients clustering
Prediction
≈
𝒃𝒓
Top-k search
C1 C2
Patient
Gene
Gene
Gene
𝑩 𝑪 𝑫
Query patient data Make related factors similar
𝑫 𝑪 𝑩
Bionetwork
𝑩 𝑪 𝑫
Personalized Subtype Analysis
𝒃𝒋
×𝟐 = 𝒯
Proposed methods
SNeCT enables integrative tensor factorization and
analysis for tensor data with network constraint
SNeCT = Scalable Network Constrained Tucker decomposition
Method 1
Formulate SGD-amenable objective function Iterative SGD update with lock-free parallel scheme
Method 2
Personalized subtype analysis
13 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Proposed methods
Formulate SGD-amenable objective function
Given the gene similarity matrix 𝒁 (∈ ℝ𝐾×𝐾) with
- bservable entries {𝑧𝑛𝑜|(𝑛, 𝑜) ∈ Ω𝒁}, network constraint
is formulated to make similar genes have similar factors:
14 / 27
Introduction Problem definition Proposed method Experiments Conclusion
𝑔
𝑪, 𝒁 = 1
2
𝑚=1 𝑅
𝑛,𝑜 ∈Ω𝒁
𝑧𝑛𝑜 𝑐𝑛𝑚 − 𝑐𝑜𝑚 2 = 1 2
𝑛,𝑜 ∈Ω𝒁
𝑧𝑛𝑜 𝒄𝑛 − 𝒄𝑜
𝐺 2
Proposed methods
Formulate SGD-amenable objective function Integrate into single objective function
15 / 27
Introduction Problem definition Proposed method Experiments Conclusion
𝑔
𝑝𝑞𝑢 = 𝑔 + 𝜇𝑔
𝑔 ,𝑩, 𝑪, 𝑫 = 1 2
𝑗,𝑘,𝑙 ∈Ω𝒴
𝑦𝑗𝑘𝑙 − 𝑦𝑗𝑘𝑙
2 + 𝜇
2 𝑆 ,𝑩, 𝑪, 𝑫 = 1 2
𝑗,𝑘,𝑙 ∈Ω𝒴
𝑦𝑗𝑘𝑙 − 𝑦𝑗𝑘𝑙
2 +
𝜇 Ω𝒴 𝐺
2 + 𝜇
𝒃𝑗
𝐺 2
Ω𝒴
𝑗
+ 𝒄𝑘
𝐺 2
Ω𝒴
𝑘
+ 𝒅𝑙
𝐺 2
Ω𝒴
𝑙
𝑔
𝑪, 𝒁 = 1
2
𝑛,𝑜 ∈Ω𝒁
𝑧𝑛𝑜 𝒄𝑛 − 𝒄𝑜 𝐺
2
Proposed methods
Calculate gradients of 𝑔
𝑝𝑞𝑢 with respect to the core
tensor and factor matrices for a given data point 𝑦𝛽=(𝑗𝑘𝑙)
- r 𝑧𝛾=(𝑛𝑜)
ฬ
𝜖𝑔
𝑝𝑞𝑢
𝜖𝒄𝑘 𝛽
, ฬ
𝜖𝑔
𝑝𝑞𝑢
𝜖𝒅𝑙 𝛽
, and ฬ
𝜖𝑔
𝑝𝑞𝑢
𝜖𝒄𝑜 𝛾
are calculated symmetrically
16 / 27
Introduction Problem definition Proposed method Experiments Conclusion
ቤ 𝜖𝑔
𝑝𝑞𝑢
𝜖𝒃𝑗
𝛽
= − 𝑦𝛽 − 𝑦𝛽 ×2 𝒄𝑘 ×3 𝒅𝑙 + 𝜇 Ω𝒴
𝑗
𝒃𝑗 ቤ 𝜖𝑔
𝑝𝑞𝑢
𝜖
𝛽
= − 𝑦𝛽 − 𝑦𝛽 ×1 𝒃𝑗
𝑈 ×2 𝒄𝑘 𝑈 ×3 𝒅𝑙 𝑈 +
𝜇 Ω𝒴 ቤ 𝜖𝑔
𝑝𝑞𝑢
𝜖𝒄𝑛 𝛾 = 𝜇𝑧𝛾 𝒄𝑛 − 𝒄𝑜
Proposed methods
Parallel update with calculated gradient SNeCT(𝒴, 𝒁, 𝜇, 𝜇, 𝜃)
(𝜃: learning rate)
1.
Initialize , 𝑩, 𝑪, 𝑫 randomly
2.
repeat
3.
for ∀𝑦(𝑗𝑘𝑙)=𝛽 ∈ 𝒴, ∀𝑧 𝑛𝑜 =𝛾 ∈ 𝒁 in random order in parallel
4.
if 𝑦𝑗𝑘𝑙 ∈ 𝒴 is picked then
5.
𝒃𝑗 ← 𝒃𝑗 − 𝜃 ฬ
𝜖𝑔
𝑝𝑞𝑢
𝜖𝒃𝑗 𝛽
, 𝒄𝑘 ← 𝒄𝑘 − 𝜃 ฬ
𝜖𝑔
𝑝𝑞𝑢
𝜖𝒄𝑘 𝛽
, 𝒅𝑙 ← 𝒅𝑙 − 𝜃 ฬ
𝜖𝑔
𝑝𝑞𝑢
𝜖𝒅𝑙 𝛽
6.
← − 𝜃 ฬ
𝜖𝑔
𝑝𝑞𝑢
𝜖 𝛽
7.
else if ∀𝑧𝑛𝑜 ∈ 𝒁 is picked then
8.
𝒄𝑛 ← 𝒄𝑛 − 𝜃 ฬ
𝜖𝑔
𝑝𝑞𝑢
𝜖𝒄𝑛 𝛾
, 𝒄𝑜 ← 𝒄𝑜 − 𝜃 ฬ
𝜖𝑔
𝑝𝑞𝑢
𝜖𝒄𝑜 𝛾
9.
end if
10.
end for
- 11. until convergence condition satisfied
12.
Orthogonalize 𝑩, 𝑪, 𝑫 by QR decomposition
- 13. return , 𝑩, 𝑪, 𝑫
17 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Overview
Introduction Problem definition Proposed method Experiments Conclusion
18 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Experimental Settings
Factorize data tensor with rank-[78,48,5] Stratification
Cluster analysis Survival analysis
Prediction
T
- p-k similarity search on clinical features
Personalized subtype analysis Performance
Compare speed and convergence rate with competitor Competitor: Narita et al. 2012
19 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Stratification – Cluster Analysis
20 / 27
Introduction Problem definition Proposed method Experiments Conclusion
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 T
- tal
BLCA 16 32 2 19 22 3 32 126 BRCA 17 3 600 172 1 70 26 889 COAD 4 2 2 91 317 1 2 419 GBM 4 1 1 2 3 7 248 1 267 HNSC 242 1 6 1 60 310 KIRC 14 1 1 471 4 1 6 498 LAML 9 188 197 LUAD 302 2 2 7 1 12 29 457 LUSC 26 32 29 7 246 340 OV 1 3 1 1 348 131 485 READ 1 1 5 9 145 1 1 163 UCEC 3 1 3 117 1 348 1 10 13 2 499 T
- tal
387 315 613 362 477 581 467 348 249 188 412 17 134 4550
Stratification – Survival Analysis
Survival curves for clustered patients
21 / 27
Introduction Problem definition Proposed method Experiments Conclusion
log-rank statistics: 409 1151 1185
Prediction – Top-k similarity search
When a new query patient 𝑟 arrives with data 𝒴𝑟, calculate factor 𝒃𝑟
satisfying following equation: 𝒃𝑟 = 𝑏𝑠 min
𝒃
𝒴𝑟 − ×1 𝒃 ×2 𝑪 ×3 𝑫
Find top-k similar patients to 𝑟 and compare
22 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Cohort Clinical Features T
- p 1
T
- p 5
T
- p 10
T
- p R
BRCA Estrogen receptor status 0.72 0.85 0.86 0.81 COAD Braf gene analysis result 1.00 0.80 0.70 0.92 GBM Histological type 0.96 0.94 0.94 0.78 HNSC Hpv status by p16 testing 0.78 0.78 0.77 0.73 KIRC Histological type 1.00 0.99 0.99 0.73 LAML Calgb cytogenetics risk cat. 0.85 0.84 0.81 0.65 OV Neoplasm histologic grade 0.79 0.75 0.76 0.77 READ Braf gene analysis result 1.00 1.00 1.00 1.00 UCEC Menopause status 0.71 0.76 0.76 0.77
Personalized subtype analysis
To provide personalized interpretation for patient 𝑗, calculate ×1 𝒃𝑗=𝒯(∈ 𝑆𝑅×𝑆) Norms of rows represent gene subtype influence Norms of columns represent platform subtype influence
23 / 27
Introduction Problem definition Proposed method Experiments Conclusion
𝒯= ×1 𝒃𝑗
Performance
Comparison with another network-constrained tensor
factorization method: Narita et al. 2012
A. Speed: Iteration time – measured on sampled data B. Accuracy: Test RMSE
Introduction Problem definition Proposed method Experiments Conclusion
Overview
Introduction Problem definition Proposed method Experiments Conclusion
25 / 27
Introduction Problem definition Proposed method Experiments Conclusion
Conclusion
26 / 27
SNeCT
Parallel algorithms for network constrained tensor factorization Solve tucker decomposition through parallel SGD update
scheme
Engage common pathway gene network into Pan-Caner12
tensor
Utilize patient factor matrix on cluster analysis and survival
analysis
Propose a personalized subtype analysis scenario
Introduction Problem definition Proposed method Experiments Conclusion
27 / 29
Thank you!
Introduction Problem definition Proposed method Experiments Conclusion