Personalized Regression Enables Sample-Specific Pan-Cancer Analysis
Benjamin J. Lengerich, Bryon Aragam, Eric P . Xing {blengeri, naragam, epxing}@cs.cmu.edu @ben_lengerich, @itsrainingdata
- 1
Personalized Regression Enables Sample-Specific Pan-Cancer Analysis - - PowerPoint PPT Presentation
Personalized Regression Enables Sample-Specific Pan-Cancer Analysis Benjamin J. Lengerich, Bryon Aragam, Eric P . Xing {blengeri, naragam, epxing}@cs.cmu.edu @ben_lengerich, @itsrainingdata 1 Cancer is Complex Di ff erent mutations
Benjamin J. Lengerich, Bryon Aragam, Eric P . Xing {blengeri, naragam, epxing}@cs.cmu.edu @ben_lengerich, @itsrainingdata
Ben Lengerich | ISMB 2018
2
cancers?
clinical covariate.
Ben Lengerich | ISMB 2018
patient?
3
does this tumor’s model differ from the cohort’s?”
Ben Lengerich | ISMB 2018
4
Model Parameters
Ben Lengerich | ISMB 2018
5
Deep Learning Mixed Effects Mixtures Sample-Specific Varying-Coefficient
Simple Effects Complicated Effects
“This tumor is due to a mutation in gene TP53”
Universal Effects Personal Effects
“Self-driving cars”
Ben Lengerich | ISMB 2018
rare and common cancer types
covariates well, tissue type can be simply treated as another covariate
6
Number of Samples by Tissue Type in TCGA1
Ben Lengerich | ISMB 2018
7
Sample-Specific Models? Unknown Covariate Effects? General Framework? Varying-Coefficient [1] Known Structure [2,3,4] Sample-Specific Network Estimation [5,6] Personalized Regression
Ben Lengerich | ISMB 2018
8
Overparameterized, but not hopeless!
Samples Model Parameters β(1) β(2)
β(N)
Ben Lengerich | ISMB 2018
9
N
i=1
ℒ(i)(β(i); dβ, dU) ∝ f(X(i), Y(i), β(i))
Prediction Loss
+ ρβ
λ (β(i)) Regularization
+ ϱ(i)
γ (dβ, dU) Distance-Matching
Overparameterized, but not hopeless!
Ben Lengerich | ISMB 2018
similar to distance between sample covariates.
10
γ (dβ, dU) = γ∑ j≠i
parameter distance
covariate distance
2
Pairwise distances between all samples
Ben Lengerich | ISMB 2018
wise distance metrics:
11
dβ(x, y) = [|x1 − y1|, …, |xP − yP|]ϕT
β
dU(x, y) = [dU1(x1, y1), …, dUK(xK, yK)]ϕT
U
to understand contributions to personalization.
ϕβ , ϕU
Ben Lengerich | ISMB 2018
accurate predictive model.
complex universal effects.
12
13
Ben Lengerich | ISMB 2018
Lasso Regularization
control)
14
Number of Samples by Tissue Type in TCGA1
Ben Lengerich | ISMB 2018
Collection
Normal Cells, Pct. Tumor Nuclei, Pct. Lymphocyte Infiltration,
Infiltration
Birth, Gender, Race
vectors, which expands dimensionality 5X!
15
Ben Lengerich | ISMB 2018
Selects Fewer Genes Per Sample:
16
Uses each Gene in Fewer Samples:
Red Lines Indicate Number of Variables Selected by Tissue-Specific Models Most Genes are Selected for Fewer than 500 Samples
Ben Lengerich | ISMB 2018
17
Many methods effectively identify common oncogenes Few methods effectively identify rare oncogenes
Ben Lengerich | ISMB 2018
18
Samples Genes
Red Line = oncogene
Ben Lengerich | ISMB 2018
19
Samples Genes
biological process term “Modulation
Transmission" (p <0.05FDR)
ASIC1, GRM3, and SLC8A3, which code for ion-transport processes.
been seen in vivo as an important system in thyroid cancer [1] and in vitro from leukemic cells [2], but
marker across different cancer types [3].
1. Filetti et al. European Journal of Endocrinology 1999 2. Morgan et al. Cancer Research 1986 3. Scafoglio et al. PNAS 2015
Ben Lengerich | ISMB 2018
20
Extracellular Processes - Antigen Cellular Metabolism Extracellular Processes - Membrane
Ben Lengerich | ISMB 2018
21
Ben Lengerich | ISMB 2018
analyses.
Regularization effectively learns sample-specific models.
transcriptomic data that are overlooked by traditional analyses.
22
Ben Lengerich | ISMB 2018
Election Modeling, Stock Prediction
23
Ben Lengerich | ISMB 2018
Code available at: github.com/blengerich/ personalized_regression Collaborators:
@cs.cmu.edu Travel to ISMB generously supported by ISCB Research supported by NIH
24
25
Ben Lengerich | ISMB 2018
26
β
1 , β(j) 1 ), …, dβP(β(i) P , β(j) P )]
U
1 , U(j) 1 ), …, dUK(U(i) K , U(j) K )]
Ben Lengerich | ISMB 2018
27
ϱ(i)
γ (dβ, dU) = γ∑ j≠i
( dβ(β(i), β(j))
parameter distance
− dU(U(i), U(j))
covariate distance
)
2
ϱ(i)
γ (dβ, dU) = γ∑ j≠i
( dβ(β(i), β(j))
parameter distance
− dU(U(i), U(j))
covariate distance
)
2 + ψα(dβ) + ψυ(dU)
Ben Lengerich | ISMB 2018
28
ϱ(i)
γ (dβ, dU) = γ∑ j≠i
( dβ(β(i), β(j))
parameter distance
− dU(U(i), U(j))
covariate distance
)
2 + ψα(dβ) + ψυ(dU)
beta||2
U||2
Ben Lengerich | ISMB 2018
to “fine-tune” away from the central population solution (block coordinate descent)
29
regularization ensures the personalized models respect covariate structure
Ben Lengerich | ISMB 2018
have already learned distance metrics to use for predictions.
identify the closest neighbors and use their sample-specific models.
30
Ben Lengerich | ISMB 2018
2550 125 250 500
Number of Samples
0.4 0.5 0.6 0.7 0.8 0.9 1.0
||ˆ β−β||2 ||ˆ βpop−β||2
Population Mixture VC Personalized
31
Recovery Error (Lower is Better)
Ben Lengerich | ISMB 2018
better than competing methods.
and hyperparameter tuning will likely alleviate overfitting.
32
Ben Lengerich | ISMB 2018
Instead, it identifies a variety of sample-specific patterns which do not fit into a small number of mixtures
33
Enrichment Analysis of Complete Rankings: