Cross-organism prediction of drug hepatotoxicity by sparse group factor analysis
Tommi Suvitaival Juuso A. Parkkinen Seppo Virtanen Samuel Kaski
July 19-20, 2013 – CAMDA
Cross-organism prediction of drug hepatotoxicity by sparse group - - PowerPoint PPT Presentation
Cross-organism prediction of drug hepatotoxicity by sparse group factor analysis Tommi Suvitaival Juuso A. Parkkinen Seppo Virtanen Samuel Kaski July 19-20, 2013 CAMDA Starting point High-dimensional gene-expression Sparse
Tommi Suvitaival Juuso A. Parkkinen Seppo Virtanen Samuel Kaski
July 19-20, 2013 – CAMDA
High-dimensional gene-expression data from 3 types of organisms Sparse pathological data
rat in vivo
View 1 Human in vitro 2 Rat in vitro 3 Rat in vivo Treatments
Treatments
Necrosis Increased mitosis Cellular infiltration Change, eosinophilic Microgranuloma Hypertrophy Single cell necrosis Swelling Vacuolization, cytoplasmic Deposit, glycogen DEAD Degeneration, granular, eosinophilic Edema Proliferation Change, basophilic Anisonucleosis Cellular infiltration, mononuclear cell Proliferation, Kupffer cell Nodule, hepatodiaphragmatic Degeneration, acidophilic, eosinophilic Atypia, nuclear Deposit, lipid Change, acidophilic Vacuolization, nuclear Degeneration, hydropic Hematopoiesis, extramedullary Mineralization Fibrosis Ground glass appearance
Not found Found Finding types
High-dimensional gene-expression data from 3 types of organisms Sparse pathological data
rat in vivo
View 1 Human in vitro 2 Rat in vitro 3 Rat in vivo Treatments
Treatments
Necrosis Increased mitosis Cellular infiltration Change, eosinophilic Microgranuloma Hypertrophy Single cell necrosis Swelling Vacuolization, cytoplasmic Deposit, glycogen DEAD Degeneration, granular, eosinophilic Edema Proliferation Change, basophilic Anisonucleosis Cellular infiltration, mononuclear cell Proliferation, Kupffer cell Nodule, hepatodiaphragmatic Degeneration, acidophilic, eosinophilic Atypia, nuclear Deposit, lipid Change, acidophilic Vacuolization, nuclear Degeneration, hydropic Hematopoiesis, extramedullary Mineralization Fibrosis Ground glass appearance
Not found Found Finding types
study with in vitro assay?
injury in humans using toxicogenomics data from animals?
Rat in vivo Rat in vitro Human in vitro Components Views 1
≈ × Observed data Latent variables Factor loadings
View 1 2 3 Treatments 1 2 3 Components Real numbers Zero a c t i v e i n A ) a l l v i e w s B ) a s u b s e t
v i e w s C ) a s i n g l e v i e w
GFA: ≈ ×
Rat in vivo Rat in vitro Human in vitro Components Views 1
Shared components
◮ associations between views ◮ cross-view prediction
≈ × Observed data Latent variables Factor loadings
View 1 2 3 Treatments 1 2 3 Components Real numbers Zero a c t i v e i n A ) a l l v i e w s B ) a s u b s e t
v i e w s C ) a s i n g l e v i e w Treatments Components
GFA: GFA with sparsity: ≈ × ≈ × ≈ ×
≈ × Observed data Latent variables Factor loadings
View 1 2 3 Treatments 1 2 3 Components Real numbers Zero a c t i v e i n A ) a l l v i e w s B ) a s u b s e t
v i e w s C ) a s i n g l e v i e w Treatments Components
GFA: GFA with sparsity: [ X(1) X(2) X(3) ] ; ; Z [ W(1) W(2) W(3) ] ; ; ≈ × ≈ × ≈ × ≈ ×
Sparsity in the model is encouraged due to
gene expression microarray data sets
pathology data ⇒ Sparsity in terms of variables
by their effects ⇒ Sparsity in terms of samples
variables ⇒ Spike-and-slab prior∗ for factor loadings matrix W
samples ⇒ Spike-and-slab prior for latent variables Z
∗
Probability density Value
≈ × Observed data Latent variables Factor loadings
View 1 2 3 Treatments 1 2 3 Components Real numbers Zero active in A) all views B) a subset of views C) a single view
[ X(1) X(2) X(3) ] ; ; Z [ W(1) W(2) W(3) ] ; ; ≈ ×
x(m)
i·
∼ N
m
I
∼ N (0, I) w(m)
k·
∼ N 0, 1 α(m)
k
I α(m)
k
∼ Gamma
τm ∼ Gamma
a(τ) b(τ) a(α) b(α) τ x(m)
i·
W(m) α(m) zi·
m = 1...M i = 1...N
i: samples, m: views
≈ × Observed data Latent variables Factor loadings
View 1 2 3 Treatments 1 2 3 Components Real numbers Zero a c t i v e i n A ) a l l v i e w s B ) a s u b s e t
v i e w s C ) a s i n g l e v i e w Treatments Components
GFA: GFA with sparsity: [ X(1) X(2) X(3) ] ; ; Z [ W(1) W(2) W(3) ] ; ; ≈ × ≈ × ≈ × ≈ ×
GFA GFA with sparsity x(m)
i·
∼ N
m
I
i·
∼ N
zi· ∼ N (0, I) zik ∼ H(z)
k N
1 α(z) ik
k
w(m)
k·
∼ N
1 α(m) k
I
dk
∼ H(m)
dk N
1 α(m) dk
dk
◮ Treatments that occur in all 3 types of organism:
◮ 119 compounds ◮ dosage levels middle & high ◮ time points 8/9 h & 24 h
◮ Average differential expression over the replicates of each
treatment
⇒ Treatment = sample for the model ⇒ Matching treatments between the 3 transcriptomic views Xhuman
in vitro, Xrat in vitro and Xrat in vivo
View 1 Human in vitro 2 Rat in vitro 3 Rat in vivo Treatments
Grade-weighted count
each pathological finding type over the replicates of a treatment ⇒ Pathology view Yrat
in vivo with matching
treatments to the 3 transcriptomic views
Treatments
Necrosis Increased mitosis Cellular infiltration Change, eosinophilic Microgranuloma Hypertrophy Single cell necrosis Swelling Vacuolization, cytoplasmic Deposit, glycogen DEAD Degeneration, granular, eosinophilic Edema Proliferation Change, basophilic Anisonucleosis Cellular infiltration, mononuclear cell Proliferation, Kupffer cell Nodule, hepatodiaphragmatic Degeneration, acidophilic, eosinophilic Atypia, nuclear Deposit, lipid Change, acidophilic Vacuolization, nuclear Degeneration, hydropic Hematopoiesis, extramedullary Mineralization Fibrosis Ground glass appearance
Not found Found Finding types
Our tasks:
transcriptomic responses in the 3 types of model organisms
generalize to known effects of the compounds on humans
Training: Learn associa- tions between the views
◮ 3 transcriptomic
views Xhuman
in vitro,
Xrat
in vitro and Xrat in vivo ◮ Pathology view
Yrat
in vivo
Testing: Predict the patho- logical findings Yrat
in vivo ◮ Given one of the
transcriptomic views
Training: Learn associa- tions between the views
◮ 3 transcriptomic
views Xhuman
in vitro,
Xrat
in vitro and Xrat in vivo ◮ Pathology view
Yrat
in vivo
Testing: Predict the patho- logical findings Yrat
in vivo ◮ Given one of the
transcriptomic views
0.2 0.4 0.6 0.8 1.0 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − Hypertrophy Swelling Nodule, hepatodiaphragmatic Microgranuloma Necrosis Single cell necrosis Cellular infiltration Vacuolization, cytoplasmic Proportion of test samples predicted more accurately than by other views
Relative performance of the gene expression views at predicting pathological findings
Rat in vitro Rat in vivo
◮ WTW reveals the
similarity of component activities between the variables
◮ Thanks to sparsity,
projections to many variables are 0
◮ The model
automatically decides which variables to explain by
components
Change, basophilic Swelling Microgranuloma Vacuolization, nuclear Degeneration, acidophilic, eosinophilic Cellular infiltration Change, eosinophilic Vacuolization, cytoplasmic Nodule, hepatodiaphragmatic Degeneration, granular, eosinophilic Hypertrophy Necrosis DEAD Atypia, nuclear Deposit, lipid Change, acidophilic Proliferation, Kupffer cell Mineralization Fibrosis Ground glass appearance Deposit, glycogen Degeneration, hydropic Cellular infiltration, mononuclear cell Anisonucleosis Hematopoiesis, extramedullary Edema Increased mitosis Proliferation Single cell necrosis Single cell necrosis Proliferation Increased mitosis Edema Hematopoiesis, extramedullary Anisonucleosis Cellular infiltration, mononuclear cell Degeneration, hydropic Deposit, glycogen Ground glass appearance Fibrosis Mineralization Proliferation, Kupffer cell Change, acidophilic Deposit, lipid Atypia, nuclear DEAD Necrosis Hypertrophy Degeneration, granular, eosinophilic Nodule, hepatodiaphragmatic Vacuolization, cytoplasmic Change, eosinophilic Cellular infiltration Degeneration, acidophilic, eosinophilic Vacuolization, nuclear Microgranuloma Swelling Change, basophilic
−5 5
◮ Given Xrat in vivo,
predict Yrat
in vivo ◮ Same prediction task
using ℓ1-regularized linear regression
2 4 6
Performance of rat in vivo gene expression view at predicting pathological findings
Root mean squared error − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − GFA L1 Hypertrophy Swelling Nodule, hepatodiaphragmatic Microgranuloma Necrosis Single cell necrosis Cellular infiltration Vacuolization, cytoplasmic
◮ How do the transcriptional changes in model organisms
generalize system-level effects in humans?
◮ Can the model learn structure relevant to the properties of the
compounds in an unsupervised way?
We quantify the success of translation by the retrieval of similar compounds
◮ Ground-truth:
System’s labels (level 4)
◮ Model: GFA with sparsity for the transcriptomic views of the
model organisms, Xhuman
in vitro, Xrat in vitro and Xrat in vivo ◮ Measure: Average precision in the retrieval of similar
compounds in the latent space
We quantify the success of translation by the retrieval of similar compounds ◮ Ground-truth:
(level 4)
◮ Model: GFA with sparsity for the transcriptomic views of the model organisms, Xhuman
in vitro, Xrat in vitro and Xrat in vivo
◮ Measure: Average precision in the retrieval of similar compounds in the latent space
DILI 0.00 0.05 0.10 0.15 0.20 0.20 0.25 0.30 0.35 0.40 0.45 5 10 15 20 5 10 15 20
Number of nearest samples considered for average precision Mean average precision
Method
Random
Size of the neighborhood for retrieval
◮ GFA reveals associations between the views ◮ Associations indicate what generalizes between the views ◮ Sparsity helps in this decision ◮ Latent representation allows us to explore structure in the
data in an unsupervised way
Observed data Latent variables Factor loadings
View 1 2 3 1 2 3 Real numbers Zero active in A) all views B) a subset of views C) a single view Treatments Components
≈ × ≈ ×
We can
◮ analyse the similarity of model organisms ◮ learn what generalizes from the model organisms to humans
Observed data Latent variables Factor loadings
View 1 2 3 1 2 3 Real numbers Zero active in A) all views B) a subset of views C) a single view Treatments Components
≈ × ≈ ×
Funding:
◮
The Academy of Finland ◮ Finnish Centre of Excellence in Computational Inference Research COIN, 251170 ◮ Computational Modeling of the Biological Effects of Chemicals, 140057
◮
Finnish Doctoral Programme in Computational Sciences FICS
◮
Helsinki Doctoral Programme in Computer Science